I’ve helped debugging a deadlock related to the security cache today. Basically, what happened was that custom (non-public) code for loading security rules for a document saved another document to store some settings. Whenever a document is saved, all security cache entries related to that document are invalidated. As the computation of a security access entry (that determines if a user has a certain right on a certain document) can use other information from the cache, invalidation is disabled using a lock to avoid that the new entry is based on invalidated information. This results in a deadlock when the code for loading a security access entry triggers a document save.
This got me thinking if there is something we should do to prevent this. So what I’m thinking about is the following idea:
- Let the document save event listener just write an invalidation request into a (blocking) queue.
- Let another thread wait on this queue for items to become available and if there is an item, wait for the lock and then batch-process all invalidation requests that are in the queue.
This design would avoid the deadlock I’ve seen. Further, I could imagine that this could speed up saving documents on busy instances, in particular when the security cache is not large enough to fit all security access entries that are currently needed.
However, there is also the important consequence that whenever a right change is saved, we loose the property that after the save completed, all subsequent access control checks use the modified values. In fact, it could take an arbitrary time until the change has been applied.
Any opinions on this tradeoff?
There are also alternative solutions like trying to accept invalidation requests while security access entries are computed. I’m not sure if and how exactly this could work and it certainly also depends on the exact implementation of the security cache which could change as part of my planned redesign.