I’m feeling totally at sea trying to figure out how to get clustering up and running. The XWiki docs say to create an XML file, but the referenced JGroups documentation goes into detail about almost everything except about how to create such a file. Is there a sample file available appropriate to a typical XWiki multi-server environment?
In a fault tolerant situation, if a new server comes up to replace a fallen one, will it be sufficient for the new server to simply have a copy of the XML in the right place for the cluster configuration to stabilize around the new collection of servers?
Indeed. Right now the doc says to use the examples provided by JGroups. We used to have an example file but I guess it got removed because it was never up to date with what JGroups recommend (since they provide examples files).
The tutorial says:
Here we set udp in both instances to use embedded JGroups udp.xml configuration file which auto discover cluster members. In general you should always start by looking at the example configurations files in jgroups jar. Most of them are even configurable with system properties so you don’t even have to copy and modify them.
It would be nice if someone with more knowledge updates the doc to explain what I said (if it’s correct) and to provide better links to point directly to the example config files.
That’s not how it works. Normally all the servers must be up. When one goes down the other ones that are already up continue serving the clients. If an xwiki server is down then it won’t receive the messages from the other servers. OTOH that shouldn’t be an issue since that server will read its the wiki page content from the DB (which is always up to date). So it should work fine.
I’m implementing XWiki on EC2 virtual machine instances in AWS, so my design needs to account for the possibility of a complete failure of an instance running XWiki due to hardware failures outside of my control. I want to be able to be running two XWiki servers, and automatically handle the situation in which one fails, a new instance is created to replace the one that disappeared, and the new one reestablishes its connection to the one that survived.
I’m running a regular backup of server content per the backup directions in the documentation, and configured the new server startup procedure to load that backup. The database is running in a separate AWS service, not on the XWiki server itself, so I’m not in any danger of losing the database in the event of a XWiki server failure (I’m running backups of the database in any case).
So, are we saying that if I can get a JGroups configuration set up on my servers, that all should be well?
Cool! Would be awesome if you contribute some doc on xwiki.org to set that up once you have it working
I think so. You should just need to configure XWiki’s clustering feature and it should work: when the new xwiki server loads, it’ll join the cluster and it’ll have an empty document cache and thus get its data from the DB. There’s the issue of the perm dir (installed extensions) but if you copy an existing perm dir (and remove the cache directories, see https://www.xwiki.org/xwiki/bin/view/Documentation/AdminGuide/Configuration/#HPermanentDirectory), It think it should work.
You should also probably set up the SOLR instance to be a remote SOLR instance (and not embedded).