CPU 100% for a while, xwiki can't seem to recover afterwards, not sure how to debug

I’m having trouble with a new xwiki with some old(er) pages in it. We’re running it on a docker swarm.

long story:
We’re running xwiki offline (no way to get to the internet), I’ve updated an old xwiki (13.10) to 14.9.0 which looked like it went really well! But after some time I was getting high CPU values and xwiki just got stuck (pending on getting a page, any page) and will stay that way until eternity. I’ve tried letting it run it’s course but waiting on a page for 45+ mins isn’t really normal.

Eventually I opted to just start a new xwiki and importing our pages, hoping that any erros was due to incompatibilities. No luck there! So now I’m stuck with xwiki 14.10.1 which randomly fails, sometimes every hour, sometimes once in a week.

I’ve changed the logback.xml to debug almost everything hoping to catch what’s going on but I can’t seem to find anything too far out of the ordinary (untrained eyes). The only thing I can see is that it eventually produces a lot of OutOfMemoryErrors, so I’ve changed the XmX a few times but it doesn’t change anything.

The symptoms:
When Xwiki is running (after initialization) the problem can occur, what we see happening everytime just before xwiki gets unresponsive is a big rise of CPU usage (500% cpu when I’m looking at top in the container). In that short span of CPU spiking Xwiki starts to get slower until it’s unresposive and stays that way. When looking in the logs I see stacktraces like this:

2022-12-27 09:57:10,014 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@7667a612([solr, indexer])] DEBUG o.x.x.i.SafeTreeMarshaller     - Failed to serialize item [sun.nio.ch.FileChannelImpl@3146e4c3]
com.thoughtworks.xstream.converters.ConversionException: No converter available
---- Debugging information ----
message             : No converter available
type                : jdk.internal.ref.CleanerImpl$PhantomCleanableRef
converter           : com.thoughtworks.xstream.converters.reflection.ReflectionConverter
message[1]          : Unable to make field jdk.internal.ref.PhantomCleanable jdk.internal.ref.PhantomCleanable.prev accessible: module java.base does not "opens jdk.internal.ref" to unnamed module @6e80de85
-------------------------------
        at com.thoughtworks.xstream.core.DefaultConverterLookup.lookupConverterForType(DefaultConverterLookup.java:88)
        at com.thoughtworks.xstream.XStream$1.lookupConverterForType(XStream.java:485)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:48)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:83)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshallField(AbstractReflectionConverter.java:270)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter$2.writeField(AbstractReflectionConverter.java:174)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.doMarshal(AbstractReflectionConverter.java:262)
        at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:90)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:87)
        at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeBareItem(AbstractCollectionConverter.java:94)
        at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeItem(AbstractCollectionConverter.java:66)
        at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeCompleteItem(AbstractCollectionConverter.java:81)
        at org.xwiki.xstream.internal.SafeArrayConverter.writeCompleteItem(SafeArrayConverter.java:81)
        at org.xwiki.xstream.internal.SafeMessageConverter.marshal(SafeMessageConverter.java:86)
        at org.xwiki.xstream.internal.SafeLogEventConverter.marshal(SafeLogEventConverter.java:74)
        at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:68)
        at org.xwiki.xstream.internal.SafeTreeMarshaller.convert(SafeTreeMarshaller.java:72)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)
        at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:82)
        at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37)
        at com.thoughtworks.xstream.XStream.marshal(XStream.java:1278)
        at com.thoughtworks.xstream.XStream.marshal(XStream.java:1267)
        at com.thoughtworks.xstream.XStream.toXML(XStream.java:1240)
        at org.xwiki.logging.internal.tail.XStreamFileLoggerTail.write(XStreamFileLoggerTail.java:82)
        at org.xwiki.logging.internal.tail.AbstractTextFileLoggerTail.write(AbstractTextFileLoggerTail.java:66)
        at org.xwiki.logging.internal.tail.AbstractFileLoggerTail.writeLog(AbstractFileLoggerTail.java:278)
        at org.xwiki.logging.internal.tail.AbstractFileLoggerTail.log(AbstractFileLoggerTail.java:261)
        at org.xwiki.logging.event.LoggerListener.onEvent(LoggerListener.java:82)
        at org.xwiki.observation.WrappedThreadEventListener.onEventInternal(WrappedThreadEventListener.java:77)
        at org.xwiki.observation.AbstractThreadEventListener.onEvent(AbstractThreadEventListener.java:57)
        at org.xwiki.observation.internal.DefaultObservationManager.notify(DefaultObservationManager.java:320)
        at org.xwiki.observation.internal.DefaultObservationManager.notify(DefaultObservationManager.java:285)
        at org.xwiki.logging.logback.internal.LogbackEventGenerator.append(LogbackEventGenerator.java:148)
        at org.xwiki.logging.logback.internal.LogbackEventGenerator.append(LogbackEventGenerator.java:64)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
        at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:398)
        at ch.qos.logback.classic.Logger.debug(Logger.java:486)
        at org.apache.solr.search.stats.LocalStatsCache.doGet(LocalStatsCache.java:40)
        at org.apache.solr.search.stats.StatsCache.get(StatsCache.java:221)
        at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:358)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:369)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
        at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227)
        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
        at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1003)
        at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1018)
        at org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:124)
        at org.xwiki.search.solr.internal.job.SolrDocumentIterator.getResults(SolrDocumentIterator.java:121)
        at org.xwiki.search.solr.internal.job.SolrDocumentIterator.size(SolrDocumentIterator.java:106)
        at org.xwiki.search.solr.internal.job.DiffDocumentIterator.size(DiffDocumentIterator.java:153)
        at org.xwiki.search.solr.internal.job.IndexerJob.updateSolrIndex(IndexerJob.java:121)
        at org.xwiki.search.solr.internal.job.IndexerJob.runInternal(IndexerJob.java:103)
        at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
        at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

And keeps throwing this exception…

Does anyone have an idea where to look specifically?
Thanks!

I still need to look deeper into this problem, but I’ve seen some improvements when I stumbled on another problem; Recently I lost connection with my SMTP server so I’ve reset the options for email in xwiki and for some reason it looks it now doesn’t hang that frequent!

Does this sound reasonable or am I just seeing things…

I’ve set up a testing environment for this issue and I can now say I can make it fail after 5 minutes of startup.

We’re using our docker-swarm with Traefik for routing and Icinga for monitoring the following is our .yml file:

version: '3.4'
services:
  test-xwiki-web:
    image: xwiki:14.10.1-mysql-tomcat
    deploy:
      labels:
        - traefik.enable=true
        - traefik.http.services.test-xwiki.loadbalancer.server.port=8080
        - traefik.http.routers.test-xwiki_http.rule=Host(`redacted`)
        - traefik.http.routers.test-xwiki_http.entrypoints=http
        - traefik.http.routers.test-xwiki_http.middlewares=https_redirect
        - traefik.http.routers.test-xwiki_https.rule=Host(`redacted`)
        - traefik.http.routers.test-xwiki_https.entrypoints=https
        - traefik.http.routers.test-xwiki_https.tls=true
        - traefik.http.middlewares.https_redirect.redirectscheme.scheme=https
        - traefik.http.middlewares.https_redirect.redirectscheme.permanent=true
        - traefik.docker.network=traefik
    environment:
      - DB_USER=xwiki
      - DB_PASSWORD=redacted
      - DB_DATABASE=xwiki
      - DB_HOST=test-xwiki-db
      - JAVA_OPTS="-Xmx2048m"
    volumes:
      - /redacted/xwiki-test/logback.xml:/usr/local/tomcat/webapps/ROOT/WEB-INF/classes/logback.xml
      - /redacted/xwiki-test/xwiki-data:/usr/local/xwiki
    networks:
      - default
      - traefik
      - icinga
  test-xwiki-db:
    image: mysql:5.7.34
    volumes:
      - /redacted/xwiki-test/mysql-data:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=redacted
      - MYSQL_USER=xwiki
      - MYSQL_PASSWORD=redacted
      - MYSQL_DATABASE=xwiki
    configs:
      - source: mysql-config
        target: /etc/mysql/conf.d/xwiki.cnf
    networks:
      - default
      - icinga
configs:
  mysql-config:
    external:
      name: xwiki-mysql-config
networks:
  traefik:
     external: true
  icinga:
     external: true

When the load just jumps we see the following in cpu/memory (new instance started at 13.40, hanging starts at 13:55):
image
image

Untitled

As you can see the CPU usage goes down after 14.05 but the problem still persists after that and I just can’t get a grasp on why…

I think I’ve also find the exact moment it gets java.lang.OutOfMemoryError: Java heap space errors

2023-01-02 13:54:52,044 [Connection evictor] DEBUG ingHttpClientConnectionManager - Closing expired connections
2023-01-02 13:54:52,044 [Connection evictor] DEBUG ingHttpClientConnectionManager - Closing expired connections
2023-01-02 13:54:52,044 [Connection evictor] DEBUG ingHttpClientConnectionManager - Closing connections idle longer than 50000 MILLISECONDS
2023-01-02 13:54:52,044 [Connection evictor] DEBUG ingHttpClientConnectionManager - Closing connections idle longer than 50000 MILLISECONDS
2023-01-02 13:55:12,888 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG o.q.c.QuartzSchedulerThread    - batch acquisition of 0 triggers
2023-01-02 13:56:00,337 [Connection evictor] DEBUG ingHttpClientConnectionManager - Closing expired connections
2023-01-02 13:56:00,337 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@5da1c668([solr, indexer])] DEBUG o.x.x.i.SafeTreeMarshaller     - Failed to serialize item [[[C@69022587]
java.lang.OutOfMemoryError: Java heap space
2023-01-02 13:56:00,337 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG o.q.c.QuartzSchedulerThread    - batch acquisition of 0 triggers
2023-01-02 13:56:00,338 [Connection evictor] DEBUG ingHttpClientConnectionManager - Closing connections idle longer than 50000 MILLISECONDS
2023-01-02 13:56:02,610 [Connection evictor] DEBUG ingHttpClientConnectionManager - Closing expired connections
2023-01-02 13:56:02,611 [Connection evictor] DEBUG ingHttpClientConnectionManager - Closing connections idle longer than 50000 MILLISECONDS
2023-01-02 13:56:02,611 [MetricsHistoryHandler-12-thread-1] DEBUG .a.s.h.a.MetricsHistoryHandler - -- collectMetrics
2023-01-02 13:56:02,611 [MetricsHistoryHandler-12-thread-1] DEBUG .a.s.h.a.MetricsHistoryHandler - --  collecting local jvm...
2023-01-02 13:56:02,611 [MetricsHistoryHandler-12-thread-1] DEBUG .a.s.h.a.MetricsHistoryHandler - --  collecting local node...
2023-01-02 13:56:03,719 [extension.index job group daemon thread - org.xwiki.extension.index.internal.job.ExtensionIndexJob@4b216e29([extension, index])] DEBUG o.x.x.i.SafeTreeMarshaller     - Failed to serialize item [[Lorg.apache.solr.update.VersionBucket;@3354718d]
java.lang.OutOfMemoryError: Java heap space
02-Jan-2023 13:56:02.611 SEVERE [Catalina-utility-2] org.apache.coyote.AbstractProtocol.startAsyncTimeout Error processing async timeouts
        java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
                at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
                at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
                at org.apache.coyote.AbstractProtocol.startAsyncTimeout(AbstractProtocol.java:633)
                at org.apache.coyote.AbstractProtocol.lambda$start$0(AbstractProtocol.java:618)
                at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
                at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
                at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
                at java.base/java.lang.Thread.run(Unknown Source)
        Caused by: java.lang.OutOfMemoryError: Java heap space
02-Jan-2023 13:56:05.883 SEVERE [Catalina-utility-2] org.apache.catalina.core.StandardServer.startPeriodicLifecycleEvent Error sending periodic event
        java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
                at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
                at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
                at org.apache.catalina.core.StandardServer.startPeriodicLifecycleEvent(StandardServer.java:946)
                at org.apache.catalina.core.StandardServer.lambda$startInternal$0(StandardServer.java:936)
                at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
                at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
                at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
                at java.base/java.lang.Thread.run(Unknown Source)
        Caused by: java.lang.OutOfMemoryError: Java heap space
2023-01-02 13:56:08,025 [MetricsHistoryHandler-12-thread-1] DEBUG .a.s.h.a.MetricsHistoryHandler - --  collecting local core...

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 3"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 5"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "http-nio-8080-Poller"
2023-01-02 13:58:51,211 [extension.index job group daemon thread - org.xwiki.extension.index.internal.job.ExtensionIndexJob@4b216e29([extension, index])] DEBUG o.x.x.i.SafeTreeMarshaller     - Failed to serialize item [java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@44f7135e]
java.lang.OutOfMemoryError: Java heap space

The stacktrace in my previous comment is still the most seen failure, but I don’t think its the cause here but more a reaction on the out of memory problem… which should be fixed when the CPU usage is going down (at 14:05) but it just stays neverendingly pending any page you try to get to.

Would be interesting to enable automatic memory dump on OutOfMemoryError to have a better idea of where all this memory went. See https://dev.xwiki.org/xwiki/bin/view/Community/Debugging#HAnalyzeOutOfMemoryissues for more details on how to enable it.

I’ve set the HeapDumpOnOutOfMemoryError AND HeapDumpPath like this:

      - JAVA_OPTS="-XX:+HeapDumpOnOutOfMemoryError"
      - JAVA_OPTS="-XX:HeapDumpPath=/usr/local/xwiki/dump"
      - JAVA_OPTS="-Xmx2048m"

But I’m not getting any dumps (while I’m seeing OOM errors like before), I’ve tried a few different paths as well.
I’ve also tried setting it as one:

      - JAVA_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/xwiki/dump -Xmx2048m"

But that gives me errors as it just sees it as 1 really long option instead of 3 seperate.

I’ll keep trying, but I’m not sure what’s wrong.

That’s definitely not going to work as each line replace the previous one from what I understand.

Where are you setting this ?

in a docker compose yml, see above yml (2nd comment on original post)

If I set them as 1 JAVA_OPTS I’ll get the following:

Configuring XWiki...
 Setting environment variables
   Deploying XWiki in the 'ROOT' context
 Replacing environment variables in files
   Generating authentication validation and encryption keys...
   Setting permanent directory...
   Configure libreoffice...
   Reusing existing config file hibernate.cfg.xml...
   Reusing existing config file xwiki.cfg...
   Reusing existing config file xwiki.properties...
 NOTE: Picked up JDK_JAVA_OPTIONS:  --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
 Unrecognized VM option 'HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/xwiki/dump -Xmx2048m'
 Did you mean '(+/-)HeapDumpOnOutOfMemoryError'? Error: Could not create the Java Virtual Machine.
 Error: A fatal exception has occurred. Program will exit.

so it looks like it sees it as 1 parameter…

setting them without quotations helped!

I’m going to wait until it gets the OOM errors a few times.

Ok,

I’ve had 1 dump (and that’s all I get in one run). I’ve opened it in the Eclipse Memory Analyzer and I’m seeing the following stuff (I’m not well versed on this matter, if I need to show something else I’d be more than happy to comply):
image

image

image

image

image

image

image

image

Looking at the result it does look like solr is a big suspect. I’ve also seen a lot of errors about solr and not serializing stuff…

Is this something you’ve seen before (and hopefully fixable?)

Would that be possible for me to get that memory dump to dig a bit more ? Maybe there is sensitive stuff in it ?

sending a dm via matrix :wink:

So, moving on from our DM’s (thanks @tmortagne), prod. has now restarted twice (because of our healthcheck on it).

It hasn’t dumped anything because I think prod was already slowed down too much that it trigger the healthcheck instead of getting OOM errors.

But the 2nd time it failed on the healthcheck I had the oppertunity to get some logs out:

 2023-01-04 08:33:01,065 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,066 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,067 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,067 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,067 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,067 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,068 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,068 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)
 2023-01-04 08:33:01,068 [recoveryExecutor-23-thread-1-processing-x:events] WARN  o.a.s.u.UpdateLog              - REPLAY_ERR: Exception replaying log
 java.util.concurrent.RejectedExecutionException: null
 	at org.apache.solr.util.OrderedExecutor.execute(OrderedExecutor.java:65)
 	at org.apache.solr.update.UpdateLog$LogReplayer.execute(UpdateLog.java:2058)
 	at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1922)
 	at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1784)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 	at java.base/java.lang.Thread.run(Unknown Source)

though it’s not much (the same error every few milliseconds).

I’ll set the healthcheck a bit more lenient and hope for something more next time.

First time I see this type of log. It seems to suggest that Solr is not in a great shape.

It might be interesting to try a standalone Solr approach as it tend to start causing performances problems in embedded mode when the database is growing, and I remember you mentioned having quite a lot of users. Note that if you go for this, you should also move the xwiki_ratings and xwiki_events cores (in embedded mode located in <permdir>/store/solr but without the xwiki_ prefix) to the standalone Solr instance, as these two contain unique data which cannot be reconstructed (the two others will be reindexed automatically but it could take a bit of time).

You’re scaring me…

Isn’t there a way to just hard resetting the whole SOLR instance, hopefully fixing the problem?
Also could I deduct from this that, us importing our pages in a new xwiki instance made SOLR crap itself?

I’ll try and add the standalone SOLR to our test instance first then…

It’s easy, but as I said, you might lose data by doing that for core rating (meaning loose likes) and core events (meaning loose past notifications) so depending on how deeply you care about that you could:

  • full wipe: stop XWiki and delete <permdir>/store/solr and <permdir>/cache/solr
  • keep some cores: stop XWiki and delete <permdir>/store/solr/<cores you don't want to keep> and <permdir>/cache/solr

We don’t really use the rating system, past notifications is a hit I’m willing to take.

We’ll do a full wipe and see how xwiki will react on that… Doing it now!
I’ll update the post :wink:

Note that by “causing problem” I only meant performance problem, it’s the first time I see something that looks like corruption (if that’s really it).

Having it run for a bit more than 12 hours and I’m rather slightly enthusiastic about how ‘healthy’ the logs look right now!

The only warning I’m seeing right now are the ExtensionIndexJob/UnknownHostException but that’s because it can’t reach the internet! (I understand there is a ticket for that, so it’s all good!).

I’ll keep an eye on it for now and will post any updates (or set this to solved when it has been online for 14+ days without failure).

Not sure which ticket you are referring to. Generally the best here is to explicitly tell XWiki there is no extensions repository available using extension.repositories= in xwiki.properties so that it does not even try.

I’ll add it.

We had another crash (same M.O.) but the dump was ‘only’ 0 bytes, I guess docker restarted the container a bit too fast… we change the healthcheck again to be more lenient.I did say in the logs it was a OOM error.