SOLR is reindexing all pages after xwiki restart

AutomationX · September 18, 2018, 8:07pm

Environment:

Docker version 18.06.1-ce, build e68fc7a on ubuntu 16.04.10
xwiki:10.7-postgres-tomcat
postgres:9.6.10

When restarting the xwiki docker container (not deleting/recreating anything), SOLR is reindexing all pages.
This can be seen in the administration by viewing the index queue decrease.
Since thats thousands of pages, it takes several minutes.

Is this expected behavior?

The directory /usr/local/xwiki/data/solr is persisted on the defined docker volume. And all data within the container itself shouldn’t be deleted if the container isn’t recreated?

Thank your for your answer,
BR
Mario

tmortagne · September 19, 2018, 7:40am

Definitely not.

There is a better way to see this. At startup the SOLR module check if there is missing page in the index or SOLR documents to remove and indicate that in the log.

AutomationX · September 19, 2018, 7:55am

Dear Thomas, I believe this is in fact expected behavior.
I’ve found this in xwiki.properties:

#-# [Since 6.1M2]
#-# Indicating if a synchronization between SOLR index and XWiki database should be run at startup.
#-# Synchronization can be started from search administration.
#-# The default is true.
# solr.synchronizeAtStartup=false

And also here. If I uncomment solr.synchronizeAtStartup=false the re-index does not occur.
So that answers my question
Maybe you could at a comment on this in the SOLR documentation (or maybe I’ve overlooked it)

BR
Mario

tmortagne · September 19, 2018, 8:06am

Actually you misunderstood. As I said, at startup SOLR try to find missing pages and stuff to remove from the index, it does not reindex the whole thing from scratch.

AutomationX · September 19, 2018, 5:47pm

Sry, I have misread that.

I’ve now measured the time difference on our realistic testsystem between a manual reindex action (took ~3min30sec), and restaring the xwiki container with solr.synchronizeAtStartup=true (took ~4min 45sec)
Sooooo, I’m not quite convinced . Maybe there is a bug somewhere.

Since I’m currently evaluating whether or not we use an external SOLR docker container instead of the embedded way, this issue has no priority for us. (Will be interesting to see what happens in that setup)

Thank your for your feedback nevertheless.

BR

tmortagne · September 20, 2018, 12:57pm

You should probably report a jira issue with all the details about your environment on Loading....

dgervalle · September 22, 2018, 9:25am

You might be interested to check this issue: Loading...
Adding the index described in there could dramatically improve the synchronization process happening at startups. This process has you have seen, could be safely disabled, but is a way to ensure the SOLR index is fully up-to-date and clean despite any hick-up that might happen during live indexing. It only compare the state of the database with the state of the index, and only update the index when needed. It also allows safely deleting the SOLR index, and having it recreated from scratch upon startup if your index was lost or get corrupted.