Uploaded image for project: 'Corda'
  1. CORDA-2566

Deadlock between database and AppendOnlyPersistentMap

    Details

    • Severity:
      Critical
    • Target Version/s:
    • Feature Team:
      Performance and Platform Sustainability
    • Sprint:

      Description

      We observed nodes locking up on Azure and OVH. This is output captured from a run on OVH with Rick's cache locking branch (https://github.com/corda/enterprise/tree/parkri-cache-locking).

      A thread on the receiving node holds a lock in the PersistentMap, while waiting to read from the DB, see stack-trace below.

      "flow-worker" #1658 daemon prio=5 os_prio=0 tid=0x00007fa9e40a9800 nid=0x811d runnable [0x00007fa66a0be000]
         java.lang.Thread.State: RUNNABLE
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
              at java.net.SocketInputStream.read(SocketInputStream.java:171)
              at java.net.SocketInputStream.read(SocketInputStream.java:141)
              at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:999)
              at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:989)
              at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
              at sun.security.ssl.InputRecord.read(InputRecord.java:503)
              at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
              - locked <0x00007fac6d4b7f38> (a java.lang.Object)
              at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
              at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
              - locked <0x00007fac6d4b9b70> (a sun.security.ssl.AppInputStream)
              at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1971)
              at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6362)
              - locked <0x00007fb3e20001a8> (a com.microsoft.sqlserver.jdbc.TDSReader)
              at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7620)
              at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:576)
              at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:508)
              at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7233)
              at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2869)
              - locked <0x00007fac6d4c2bf0> (a java.lang.Object)
              at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:243)
              at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:218)
              at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:461)
              at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
              at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
              at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.executeUpdate(ResultSetReturnImpl.java:175)
              at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3171)
              at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3686)
              at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:90)
              at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:604)
              at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:478)
              at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:356)
              at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:39)
              at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1454)
              at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1440)
              at net.corda.node.utilities.AppendOnlyPersistentMapBase.loadValue(AppendOnlyPersistentMap.kt:144)
              at net.corda.node.utilities.AppendOnlyPersistentMapBase.access$loadValue(AppendOnlyPersistentMap.kt:22)
              at net.corda.node.utilities.AppendOnlyPersistentMapBase$transactionalLoadValue$3.invoke(AppendOnlyPersistentMap.kt:159)
              at net.corda.node.utilities.AppendOnlyPersistentMapBase$Transactional$Unknown$valueWithoutIsolationDelegate$1.invoke(AppendOnlyPersistentMap.kt:279)
              at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
              - locked <0x00007fb3e70008e0> (a kotlin.SynchronizedLazyImpl)
              at net.corda.node.utilities.AppendOnlyPersistentMapBase$Transactional$Unknown.isPresent(AppendOnlyPersistentMap.kt:277)
              at net.corda.node.utilities.AppendOnlyPersistentMapBase$Transactional.orElse(AppendOnlyPersistentMap.kt:250)
              at net.corda.node.utilities.AppendOnlyPersistentMapBase.get(AppendOnlyPersistentMap.kt:40)
              at net.corda.node.services.identity.PersistentIdentityService$certificateFromKey$1.invoke(PersistentIdentityService.kt:156)
              at net.corda.node.services.identity.PersistentIdentityService$certificateFromKey$1.invoke(PersistentIdentityService.kt:33)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:219)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:199)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:205)
              at net.corda.node.services.identity.PersistentIdentityService.certificateFromKey(PersistentIdentityService.kt:156)
              at net.corda.node.services.identity.PersistentIdentityService.stripNotOurKeys(PersistentIdentityService.kt:194)
              at net.corda.node.services.keys.BasicHSMKeyManagementService$filterMyKeys$1.invoke(BasicHSMKeyManagementService.kt:106)
              at net.corda.node.services.keys.BasicHSMKeyManagementService$filterMyKeys$1.invoke(BasicHSMKeyManagementService.kt:32)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:219)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:199)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:205)
              at net.corda.node.services.keys.BasicHSMKeyManagementService.filterMyKeys(BasicHSMKeyManagementService.kt:105)
              at net.corda.node.services.vault.NodeVaultService$makeUpdates$1.invoke(NodeVaultService.kt:230)
              at net.corda.node.services.vault.NodeVaultService.makeUpdates(NodeVaultService.kt:285)
              at net.corda.node.services.vault.NodeVaultService.access$makeUpdates(NodeVaultService.kt:59)
              at net.corda.node.services.vault.NodeVaultService$notifyAll$2.invoke(NodeVaultService.kt:210)
              at net.corda.node.services.vault.NodeVaultService.notifyAll(NodeVaultService.kt:221)
              at net.corda.node.services.api.ServiceHubInternal$Companion$recordTransactions$1.invoke(ServiceHubInternal.kt:106)
              at net.corda.node.services.api.ServiceHubInternal$Companion$recordTransactions$1.invoke(ServiceHubInternal.kt:51)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:219)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:199)
              at net.corda.nodeapi.internal.persistence.CordaPersistence.transaction(CordaPersistence.kt:205)
              at net.corda.node.services.api.ServiceHubInternal$Companion.recordTransactions(ServiceHubInternal.kt:60)
              at net.corda.node.services.api.ServiceHubInternal$DefaultImpls.recordTransactions(ServiceHubInternal.kt:132)
              at net.corda.node.internal.AbstractNode$ServiceHubInternalImpl.recordTransactions(AbstractNode.kt:1005)
              at net.corda.core.internal.ResolveTransactionsFlow.call(ResolveTransactionsFlow.kt:102)
              at net.corda.core.internal.ResolveTransactionsFlow.call(ResolveTransactionsFlow.kt:25)
              at net.corda.node.services.statemachine.FlowStateMachineImpl.subFlow(FlowStateMachineImpl.kt:294)
              at net.corda.core.flows.FlowLogic.subFlow(FlowLogic.kt:311)
              at net.corda.core.flows.ReceiveTransactionFlow.call(ReceiveTransactionFlow.kt:46)
              at net.corda.core.flows.ReceiveTransactionFlow.call(ReceiveTransactionFlow.kt:27)
              at net.corda.node.services.statemachine.FlowStateMachineImpl.subFlow(FlowStateMachineImpl.kt:294)
              at net.corda.core.flows.FlowLogic.subFlow(FlowLogic.kt:311)
              at net.corda.core.flows.ReceiveFinalityFlow.call(FinalityFlow.kt:260)
      

      A different thread is likely holding a lock in the DB, while waiting for the lock on the AppendOnlyMap.

      The full stack-traces of the sending and receiving nodes are attached (node1 and node2).

      sp_who2 shows suspended selects.

        Attachments

          Issue links

            Activity

              People

              • Assignee:
                Christian.Sailer Christian Sailer
                Reporter:
                Thomas.Schroeter Thomas Schroeter
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: