I noticed whilst writing some new performance benchmark flows that if a checkpoint stored in the database cannot be deserialized, then the node will not start. In the perf cluster, we go into a loop trying to restart the node. We should at least catch individual flows that cannot be deserialized during node restart, log their flow ID and an appropriate message, but then continue with the next flow. This check should also apply to any flow "retried" from the flow hospital.
Now, I'd obviously skipped the dev mode checkpoint checker in this instance (but I guess others could too), but it's perhaps possible someone could inject something into the database in a DoS attempt or perhaps more likely, due to a version upgrade / bug encounter the same thing.
It might also be nice to represent the flow as failed in some way in the Flow Hospital and offer some mechanism to clean up. Perhaps put this work in a further Story once the initial bug has been fixed, since it only offers benefits / depends on future Flow Hospital work.