How to begin to troubleshoot an XWiki crash?

Hello.

I have an XWiki 16.7.1 under Apache Tomcat 9.0.95 on a Debian based VM on GCP. Running with Java openjdk 17.0.13 2024-10-15

Yesterday the XWiki instance crashed and I would like to know how can I start troubleshooting the ‘cause’ of the crash. And/or if I need to enable something on the instance to be able to catch the cause of the crash next time.

image
This is what we saw on the page, seemingly Apache Tomcat was unable to find the internal website of XWiki, correct?

Thank you very much for any help.

Here’s the Catalina last log from yesterday with a SEVERE indicator:

15-Jan-2025 19:47:41.439 SEVERE [Catalina-utility-1] org.apache.catalina.core.ContainerBase.threadStart Exception processing background thread
        java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
                at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
                at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
                at org.apache.catalina.core.ContainerBase.threadStart(ContainerBase.java:1097)
                at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessorMonitor.run(ContainerBase.java:1141)
                at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
                at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
                at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
                at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
                at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
                at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)
                at java.base/java.lang.Thread.run(Thread.java:840)
        Caused by: java.lang.OutOfMemoryError: Java heap space
                at java.base/java.util.TreeMap.addEntry(TreeMap.java:765)
                at java.base/java.util.TreeMap.put(TreeMap.java:828)
                at java.base/java.util.TreeMap.put(TreeMap.java:534)
                at java.base/java.util.TreeSet.add(TreeSet.java:255)
                at java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:336)
                at java.base/java.util.TreeSet.addAll(TreeSet.java:309)
                at org.apache.catalina.webresources.Cache.backgroundProcess(Cache.java:212)
                at org.apache.catalina.webresources.StandardRoot.backgroundProcess(StandardRoot.java:608)
                at org.apache.catalina.core.StandardContext.backgroundProcess(StandardContext.java:4817)
                at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1172)
                at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1176)
                at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1176)
                at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1154)
                ... 7 more
15-Jan-2025 19:50:11.639 SEVERE [Catalina-utility-3] org.apache.catalina.core.ContainerBase.threadStart Exception processing background thread
        java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
                at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
                at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
                at org.apache.catalina.core.ContainerBase.threadStart(ContainerBase.java:1097)
                at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessorMonitor.run(ContainerBase.java:1141)
                at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
                at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
                at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
                at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
                at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
                at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)
                at java.base/java.lang.Thread.run(Thread.java:840)
        Caused by: java.lang.OutOfMemoryError: Java heap space

For general debugging:

In your case the cause is clear:

You need to increase the memory given to Tomcat. See https://www.xwiki.org/xwiki/bin/view/Documentation/AdminGuide/Performances/#HMemory

1 Like

Ok, thank you Vincent.

I am leaning to believe that I have a memory leak somewhere, at least from what I can gather on the ‘memory histogram’ of the instance, it increases overtime and it’s a relatively small instance (in terms of users ~10):


I’m currently running the following settings on the setenv.sh file @ /opt/tomcat/apache-tomcat-9.0.95/bin/setenv.sh based on the recommended settings from the page you linked. Should I use more?

export JAVA_OPTS="$JAVA_OPTS -Xms1024M -Xmx1600M \
-Dfile.encoding=UTF-8 \
-Djava.awt.headless=true \
-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true \
-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true \
--add-opens java.base/java.lang=ALL-UNNAMED \
--add-opens java.base/java.io=ALL-UNNAMED \
--add-opens java.base/java.util=ALL-UNNAMED \
--add-opens java.base/java.util.concurrent=ALL-UNNAMED"

How can I find, if it is a memory leak, it?

The easiest to identify what is (maybe wrongly) using all that memory is to enable automated memory dump.

1 Like

Hi Thomas.

Excellent,

If I understand the instructions correctly, I should just add those two parameters to my “setenv.sh” file, restart tomcat and then wait for the next crash.

My setenv.sh would look like:

export JAVA_OPTS="$JAVA_OPTS -Xms1024M -Xmx1600M \
-Dfile.encoding=UTF-8 \
-Djava.awt.headless=true \
-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true \
-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true \
--add-opens java.base/java.lang=ALL-UNNAMED \
--add-opens java.base/java.io=ALL-UNNAMED \
--add-opens java.base/java.util=ALL-UNNAMED \
--add-opens java.base/java.util.concurrent=ALL-UNNAMED \
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/tomcat/apache-tomcat-9.0.95/memdumps"

Am I on the right path?

Thank you.

Yes, that’s the idea. Next time you hit a OutOfMemoryError, java should take a snapshot of the memory exactly at that moment so that you can take a look at it.

1 Like

Thank you!

I’ll follow up on this thread once it crashes again.

Bests,

Hello.
I’ve downloaded the hprof files that got generated on Jan 17 and also Jan 20.

I downloaded and opened them on EclipseMemoryAnalyzer and this is what I see:

The file is enourmous (~3GB) so I can’t share it here.
What else can I provide?

I’m no Java expert so this is way beyond my comfort zone but if I can be steered at least in the correct direction or is there any documentation written for this?

The StackTrace is giving attached here (it’s very long!):
StackTrace.txt (526.6 KB)

And also here’s my object list (I’m just blindly taking information from the analyzer, I have no clue where to look at hah)

This screenshot seems to suggest that one of the scheduler jobs is using a lot of memory.

This suggests that this memory is spent on loading (and more importantly, keeping) a bunch of documents.

In that stack trace, I see a reference to a LoggingEventListener component which is called for each document save, which itself seems to cause a document save. Given the size of that stack trace, it seems like this event listener is stuck in a loop where it triggers itself by saving the same document over and over (and eventually fill the memory with XWikiDocument instances).

Feels like someone experimented with listeners starting from https://www.xwiki.org/xwiki/bin/view/Documentation/DevGuide/Tutorials/WritingEventListenerTutorial/#HLogwhenadocumentismodified.

Thank you Tom.

You’re right, I’m now realizing that a colleague started to add that exact same code that you linked on every-single top page on our instance. Effectively breaking the site whenever someone opens one of those pages.