What to index as the page title for search?

I have a question that is seemingly simple, but I ran into a problem with custom display sheets.

When a page gets indexed, it does not use the “title” field stored in the data base straight away. Instead it calls the “Document.getPlainTitle”, as the title might contain velocity code that is executed to render the title.

Also, if the page has a document sheet (or a class sheet for one of its objects) attached, the title is rendered as given by the sheet.

Now here is the catch: the title is computed as viewed by the anonymous user, as the indexer thread runs in the background without an user in the context. If the sheet to compute the title is not visible by the anonymous user, it is not available by the indexer and thus in the search only the title as stored in the database is used.

This causes some irritation by users who expect the title shown on the page to be the same as in the search results. More irritating is that they cannot search for terms in the (displayed) title.

As an example take a customization of the TaskManager, where the title of the task is prefixed by the project and the task number. In that case one cannot search for the project + task number via the search. (That is poor example, as it is still possible via the TaskManager overview; however I am working on a custom extension where this is less easy to do.)

How can we deal with this? I can image the following solutions:

  1. Keep it as it is, and explain it to the users.
    Implementation wise this is certainly the simplest solution. However this leaves the “explaining” part, which I currently have trouble with.
  2. Index the page with some other user but the anonymous user.
    This has the risk of a security leak: the obvious choice is the “superadmin” account, but then users can create a page with a document sheet pointing to another page where they have no view rights but only know the page name, and then look up the title of the protected page show as title of their page in the search results.
    One might instead set up and configure a dedicated “Search Indexing” user (with the anonymous user as a fallback), but this seems a bit contrived and nonintuitive. However it would work in the use case I am facing.
  3. Allow extensions to override the title computation
    In that case the extension might check if an indexed page has an object if a certain class whose sheet is view protected, and can try to apply the sheet to compute the title even though it is not accessible by the anonymous user.
    Of course this can create data leaks as well, but only if the extension contains a bug that does not check if the sheet is “admin only” or the like (which should not happen for a normal extension)
  4. something else?

I’d prefer solution 3 so far (especially as something similar might also be used to customize the indexed content of the page, e.g. with an extension that omits the content of macros containing code like “velocity”, “groovy”, etc., which is another feature request I got.

Any thoughts on this?

(P.S.: the effect is also visible for user profiles: “Profile of First Lastname” is only shown in search results of user profile pages if the wiki is visible for anonymous users, otherwise it is just the Login/UserName. In that case the sheet does not add much information, however.)

1 Like

Hello,

I’m not sure to see solution 3: how would you then get the title of the pages in the indexer? You’d need to check first if the page belongs to an extension, and call some extension interface? Sounds expensive.

Looks like the variant of solution 2 with a dedicated user for indexing would be the most interesting compromise in terms of cost / protection / usability.

+1 for solution 2 with a configurable user (which would be guest by default), as a first fix at least

Yes, I though of having a default component for the current behavior which extensions can override to check if the page contains an object of their extension. I thought it is not so expensive as the page will already be loaded including its objects.

Then I am also ok with solution 2.

Update: I have made an implementation proposal as PR here:

https://github.com/xwiki/xwiki-platform/pull/1904