Title Cache Design

vmassol · September 8, 2022, 1:32pm

Hi devs,

I’d like to brainstorm about designing a title cache so that we can estimate the cost of implementing one. See New Breadcrumb column type for LD/LT + Replace title/location? - #10 by vmassol for the context.

Here are some requirements I see:

needs to work with multi lingual xwiki instances (we need to cache per language)
the title can contain sensitive data so only users with view permissions on the doc should be allowed to access the title.
the cache should be able to contain millions of entries (if we have a wiki with 10M documents and 10 languages, that’s 100M title entries)

As Marius mentioned on New Breadcrumb column type for LD/LT + Replace title/location? - #10 by vmassol, we then need to decide the direction we want to take:

New table in the DB, or new columns to xwikispace and/or xwikidoc tables
Index the titles in Solr (are the titles already indexed per language in our solr index?) but change all code requiring access to titles without loading the document to use Solr queries (this means potential delays if solr has not indexed yet a title change).
- For example move LD to this (there could be a config option to use HQL or Solr for backward compat). Also means that LD won’t contain all the entries until they’ve been indexed by Solr (thinking of a first time XWiki start for example or if the index is deleted and recreated on a large wiki).

Any pros/cons you see about DB vs Solr for the title cache? Any preference?

Thanks

tmortagne · September 8, 2022, 1:39pm

Yes.

Honestly, whatever the store it’s going to be asynchronous as I don’t see us blocking the document save to execute and store rendered title.

The good thing with the Solr version is that it’s already implemented so quite a pro

surli · September 8, 2022, 1:41pm

So at first sight, I’d go for a Solr for title cache, the problem is that AFAIR it’s currently complex to have requests mixing properly DB and Solr and this title cache main usage would be to be displayed in LT with all sorting / filtering capabilities. So it sounds a bit complex to implement if we want to support multiple filters in the same LT without perf issue and with proper handling of pagination.

tmortagne · September 8, 2022, 1:43pm

Yes, if the goal is short term use in the LT you can forget about Solr. Moving LT to Solr it quite a work I think.

vmassol · September 8, 2022, 1:43pm

That’s one usage but we need to take into account other usages. Any code that needs to display lots of titles (and thus shouldn’t have to load all documents just to get the titles).

The LT/LD has this need too (not just for the filtering/ordering but for the title/breadcrumb columns display).

surli · September 8, 2022, 1:51pm

Well if you want filtering/ordering at all in LT/LD with the displayed title, then it cannot be in Solr unless we have some good implementation allowing to mix SQL and Solr query in LT. So I agree we should also think about other usages, but for me this is a strong argument for not going to Solr here. Unless you consider filtering/ordering is not that much important.

vmassol · September 8, 2022, 1:54pm

It is very important as it’s the UC that started this thread