Hi,
I am trying to setup a cluster with docker-compose.
My problem is that if I try to setup both xwiki instances who shares the same database (postgresql) the distribution wizard destroys anything.
I can install the standard flavor on one xwiki. Then the second xwiki instance recognizes that because it uses the same db. If I login with the created admin user into the second xwiki instance the distribution wizard pops up again. Then I click through it. I had tried all possibilities, skipping the wizard, let the wiki empty,install the standard flavor too and upgrade, merge or use existing pages, so almost everything… several hours. But on the second xwiki the xwiki logo on the navbar is missing and the page editors etc are hanging and almost everything crashes. But the color and the main design of the flavor standard is correct. But the second xwiki instance crashes if you try to do some actions in the wiki, everything hangs and you cannot edit something beside changing the title of a page.
On the next step I tried to leave bove xwiki instances empty but sharing the same postgresql db like before. I then created the admin user on one xwiki in the startup wizard and tried to import the standard xwiki flavor into both xwiki instances. But no success. If I upload the XAR file the upload is successfull, but if I try to install it, it says me that I should select at least one page. But both instances shows me no pages to select (it hangs maybe again) when I click the XAR to install it. No pages are shown to select…
I get the following error message inside the log of xwiki02 when I try to install the standard flavor XAR on xwiki01 (I think this is the limitation of attachement snychronization in the cluster mode and its normal behaviour or isn’t it?):
xwiki02 | 2021-03-10 08:18:44,701 [jgroups-17,event,db1a0461e6e1-42388] ERROR c.x.x.d.XWikiDocument - Error while trying to load deleted attachment [parentReference = [xwiki:XWiki.XWikiPreferences], filename = [org.xwiki.platform_xwiki-platform-distribution-flavor-mainwiki_13.1.xar], version = [1.1]] for doc [XWiki.XWikiPreferences]
If I get one xwiki instance installed with the flavor through the distribution wizard and I access then the other xwiki instance after that I get this error:
xwiki02 | 2021-03-10 09:11:19,570 [XWiki Solr index thread] ERROR .DocumentSolrMetadataExtractor - Failed to retrieve the content of attachment [Attachment xwiki:Help.Macros.ToC.WebHome@toc.png]
xwiki02 | com.xpn.xwiki.XWikiException: Error number 3002 in 3: The attachment [Attachment xwiki:Help.Macros.ToC.WebHome@toc.png] (file /usr/local/xwiki/data/store/file/xwiki/0/f/5e6e3bd348f1a84a245b0b9a573c0c/attachments/e/7/063883e72df3cf9857dd33d625b585/f.png) could not be found in the filesystem attachment store.
Maybe this is the error which is saying the the navbar logo could not be load or what means it?
Generally the cluster is coming up successfully, because If I change something on one working xwiki instance, the changes are synchronized to the other xwiki instance despite the other xwiki instance has less functionality (because of the missing flavor maybe) and crashes if I try to modify something else besides the page title.
Installation setup:
I use docker-compose to install the xwiki cluster on one host.
I first bring up the both xwiki instance without the cluster setup.
Both xwiki instances are using the same postgresql db.
Then I tried to setup each xwiki instance on its own like describted above.
After that I copy the persistent folders (tomcat, xwiki, postgresql/data) to the host filesystem.
Each xwiki instance has a unique path where the persistent folders are located (besides the postgresql folder).
Then I modified the xwiki.properties file of each xwiki instance and added the following settings:
observation.remote.enabled = true
observation.remote.channels = tcp
After that I put in a file tcp.xml into the path for both xwiki instances.
tomcat/webapps/ROOT/WEB-INF/observation/remote/jgroups/tcp.xml
Then I mounted each persistent folders for each instance in the docker-compose file.
I set the JAVA_OPTS to initialize the jgroups tcp cluster channel with this setting:
On the first instance:
- JAVA_OPTS=-Djgroups.bind_addr=xwiki01 -Djgroups.tcpping.initial_hosts=xwiki01[7800]
On the second instance:
- JAVA_OPTS=-Djgroups.bind_addr=xwiki02 -Djgroups.tcpping.initial_hosts=xwiki01[7800]
Then I started the containers.
So this was the procedure I have tried.
Is this the right way to setup a cluster? Is the jgroup initialization correct?
I cannot find any information on the clustering tutorial of the official xwiki community how to setup the members who are joining the cluster properly so that nothing crashes.
I have to say in addition that the docker compose setup with the persistent folders on the host is working great and without erros if I have only a standalone xwiki instance. The problem comes up if I try to setup a cluster setup.
I do also not understand how to setup the cluster properly. Can you give some more information? The information of the clustering tutorial are not complete enough.
I think the cluster sync is working because both nodes synchronizes the changes to the each other. But I cannot setup new cluster members or a cluster with an identically design and functionality without that some instance has missing design or functionality (one instance crashes and hangs to load the editor).
This is the output of the log when the cluster comes up:
xwiki01 |
xwiki01 | -------------------------------------------------------------------
xwiki01 | GMS: address=a01a9ae92441-41063, cluster=event, physical address=172.31.0.4:7800
xwiki01 | -------------------------------------------------------------------
xwiki02 |
xwiki02 | -------------------------------------------------------------------
xwiki02 | GMS: address=db1a0461e6e1-42388, cluster=event, physical address=172.31.0.3:7800
xwiki02 | -------------------------------------------------------------------
xwiki02 | 2021-03-10 08:12:19,162 [OfficeProcessThread-0] INFO .o.r.i.j.JGroupsNetworkAdapter - Channel [tcp] started
xwiki01 | 2021-03-10 08:12:19,461 [OfficeProcessThread-0] INFO .o.r.i.j.JGroupsNetworkAdapter - Channel [tcp] started
xwiki02 | 2021-03-10 08:12:19,793 [localhost-startStop-1] INFO o.x.o.i.s.DefaultOfficeServer - Open Office instance started.
xwiki01 | 2021-03-10 08:12:20,030 [localhost-startStop-1] INFO o.x.o.i.s.DefaultOfficeServer - Open Office instance started.
If you need pictures I can provide them. Please let me know.
Please help me. It is very important for me to setup a working cluster. One instance is not enough for my use case.
Thanks for your support.