Asynchronuous security cache invalidation

I’ve helped debugging a deadlock related to the security cache today. Basically, what happened was that custom (non-public) code for loading security rules for a document saved another document to store some settings. Whenever a document is saved, all security cache entries related to that document are invalidated. As the computation of a security access entry (that determines if a user has a certain right on a certain document) can use other information from the cache, invalidation is disabled using a lock to avoid that the new entry is based on invalidated information. This results in a deadlock when the code for loading a security access entry triggers a document save.

This got me thinking if there is something we should do to prevent this. So what I’m thinking about is the following idea:

  1. Let the document save event listener just write an invalidation request into a (blocking) queue.
  2. Let another thread wait on this queue for items to become available and if there is an item, wait for the lock and then batch-process all invalidation requests that are in the queue.

This design would avoid the deadlock I’ve seen. Further, I could imagine that this could speed up saving documents on busy instances, in particular when the security cache is not large enough to fit all security access entries that are currently needed.

However, there is also the important consequence that whenever a right change is saved, we loose the property that after the save completed, all subsequent access control checks use the modified values. In fact, it could take an arbitrary time until the change has been applied.

Any opinions on this tradeoff?

There are also alternative solutions like trying to accept invalidation requests while security access entries are computed. I’m not sure if and how exactly this could work and it certainly also depends on the exact implementation of the security cache which could change as part of my planned redesign.

I’m not a huge fan of this.

I understand that you feel the need to avoid this kind of problems, but a custom security loader should always be readonly IMO and causing the save of a document in the process is really a very bad design.

Okay, I can fully understand this.

Is this documented anywhere? If not, we should at least make sure it is documented.

Also, as I’ve said, this is not just about avoiding the deadlock, this is also to avoid that saving a document is blocked by a busy security cache.

I’m wondering if there are ways to avoid this lock. The motivation for me here (besides that deadlock) is that for computing a security access entry, we may need to load several documents from the database and thus invalidation might be blocked for quote some time.

I’ve just created a design page for the security cache refactoring and added some ideas how to get rid of the invalidation lock without loosing the guarantee that all right computations that start after the save completed respect the changes from the save. I haven’t checked yet if and how they can actually be implemented, I’ll do this when I start working on the security cache redesign.