Waiting for Solr indexing on refactoring operations

Hi everyone,

when a page is moved or deleted, we refactor links that point to this page. However, this only works when the links are known to XWiki. Prior to XWIKI-19352 which was implemented in XWiki 14.2, links were indexed synchronously during the saving of the document. Nowadays (since XWiki 14.8+, see XWIKI-16192), links are indexed when a page is indexed in the search index in Solr. This means that when indexing is still running either because of, e.g., a re-indexing operation or a big import, or because the page has just been moved itself, links won’t be adapted anymore.

Back when asynchronous indexing was implemented, it was discussed that refactoring jobs should wait for indexing to be complete and there should be a progress indicator - at least when an admin enables an option to consider the indexing status to be critical. Unfortunately, this waiting has never been implemented and so currently this leads to bugs. The issue XWIKI-22323 has been created to track this regression.

I propose the following changes.

  1. By default, wait for indexing to complete and display the indexing progress in the job progress (based on the initial number of documents in the queue, a number of steps will be created that are completed as the queue becomes emptier, if the queue size increases again, the remaining steps will be distributed among the newly added documents).
  2. Add an option in the scripting API (in the job options) to not wait for indexing to complete.
  3. Add a global xwiki.properties option to not wait for indexing to complete.
  4. Add an option for advanced users to not wait for indexing in the UI of move and delete jobs that can be enabled when link updating is enabled.

To me, 1 and 2 seem enough, I wouldn’t implement 3 and 4, so +1 for 1 and 2 and -0 for 3 and 4. Any opinions?

Same here, I don’t think 3 and 4 are really needed.

Note: instead of hacking around #getQueueSize() I think it’s time to introduce in SolrIndexer (and ScriptService probably) a more reliable API to wait for all the entries currently in the queues to be handled (for example by adding a special entry in the queues which unlock the wait when it’s “indexed”).

+1, same as you.

Thanks,
Marius

Imagine an XWiki start when all the pages are getting indexed, also imagine a wiki with millions of pages (with an indexing that can take hours). This will mean that any refactoring operation performed shortly after the restart will seem to hang “forever”.

Thus, I think we need to do the following:

  • Check if there are elements to be indexed in the indexing queue and when the user clicks on apply for the refactoring (move, delete, etc), if the # of elements is large (value to be defined), show a warning explaining that it’ll need to wait for all the elements to be indexed before proceeding and ask for confirmation.
  • Indicate somewhere in the progress UI that it’s waiting for the indexing to be finished + indicate the number of elements remaining to be indexed, and refresh that number every N seconds, to show progress.
  • (optional but probably good, could be done later) your option 4

WDYT?

Thx

While this sounds good, I think at the very least this waiting should be time-limited in order to provide progress information from time to time. But indeed, it probably makes sense to provide such a wait method with a timeout.

This would be a bug, the index is supposed to persist over restarts.

I fear the criteria what is large is difficult to define as it depends a lot on the content and the performance of the system. What we could do is instead to ask that question after waiting for, say, 10 seconds when the remaining queue size is still more than half of what it was at the beginning of the job. There could then also be options to ask again after some time or to wait indefinitely.

That’s what I meant by “display the indexing progress in the job progress” in my proposal.

Sure but it can be the 1st start or someone can have clicked “reindex all” in the admin UI, so the use case exists.

Sounds good, seems a little more complex to implement.

ok great, had missed it, thx.

Implementing a question is actually generally easy when the standard job displayer is used (which is the case of the refactoring jobs I think).