Memory leak in xWiki?

Hi,

We’ve been struggling with a recurring Java heap space error in xWiki for some time. Currently, xms and xmx are each set to 4096m. Nevertheless, the error occurs once a week.

The used xwiki version is: XWiki Debian 16.10.6

We’ve now analyzed two Java dumps. Maybe someone here can help with the main problems. Once interest is piqued, I can provide further information via another method if needed.
I’m concerned about the connection to JMX. It’s supposed to help monitor the system, not be the cause of the problem. Could JMX be safely disabled?

Problem 1, Dump 1

95 instances of „org.mariadb.jdbc.client.impl.StandardClient“, loaded by „org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x7356ed600“ occupy 1.275.230.448 (44,28 %) bytes. 

Most of these instances are referenced from one instance of „java.util.HashMap$Node[]“, loaded by „<system class loader>“, which occupies 25.840 (0,00 %) bytes. The instance is referenced by „org.infinispan.jmx.CacheManagerJmxRegistration @ 0x700b1bdd0“, loaded by „org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x7356ed600“. 

Keywords
org.mariadb.jdbc.client.impl.StandardClient
org.eclipse.jetty.ee8.webapp.WebAppClassLoader
java.util.HashMap$Node[]
org.infinispan.jmx.CacheManagerJmxRegistration

Problem 2, Dump 1

21 instances of „com.xpn.xwiki.XWikiContext“, loaded by „org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x7356ed600“ occupy 601.172.264 (20,88 %) bytes. 

Biggest instances:
•com.xpn.xwiki.XWikiContext @ 0x7017ccfe8 - 67.961.632 (2,36 %) bytes. 

Most of these instances are referenced from one instance of „java.lang.Thread[]“, loaded by „<system class loader>“, which occupies 1.264 (0,00 %) bytes. The instance is referenced by „org.apache.solr.common.util.SolrNamedThreadFactory @ 0x7009d8f60“, loaded by „org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x7356ed600“. 

Keywords
com.xpn.xwiki.XWikiContext
org.eclipse.jetty.ee8.webapp.WebAppClassLoader
java.lang.Thread[]
org.apache.solr.common.util.SolrNamedThreadFactory

Problem 1, dump 2

92 instances of „org.mariadb.jdbc.client.impl.StandardClient“, loaded by „org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x708bcbe60“ occupy 1.426.190.352 (38,39 %) bytes. 

Most of these instances are referenced from one instance of „java.util.HashMap$Node[]“, loaded by „<system class loader>“, which occupies 25.120 (0,00 %) bytes. The instance is referenced by „org.apache.solr.metrics.reporters.jmx.JmxMetricsReporter$JmxListener @ 0x700b87880“, loaded by „org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x708bcbe60“. 

Keywords
org.mariadb.jdbc.client.impl.StandardClient
org.eclipse.jetty.ee8.webapp.WebAppClassLoader
java.util.HashMap$Node[]
org.apache.solr.metrics.reporters.jmx.JmxMetricsReporter$JmxListener

Problem 2, dump 2

45 instances of „com.xpn.xwiki.doc.XWikiDocument“, loaded by „org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x708bcbe60“ occupy 589.225.600 (15,86 %) bytes. 

Keywords
com.xpn.xwiki.doc.XWikiDocument
org.eclipse.jetty.ee8.webapp.WebAppClassLoader

Regards,
Brunt

1 Like

Hi,

We’ve since increased xms and xmx to 8GB – without any success. The “Java Heap” error continues to occur regularly. Here’s a new summary from the MAT memory analyzer:

Dump 3

Problem Suspect 1

253 instances of org.mariadb.jdbc.client.impl.StandardClient, loaded by org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x6037d4440 occupy 4.010.066.248 (58,85 %) bytes.

Most of these instances are referenced from one instance of java.util.HashMap$Node[], loaded by , which occupies 64.784 (0,00 %) bytes. The instance is referenced by org.infinispan.jmx.CacheManagerJmxRegistration @ 0x6003af4e8, loaded by org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x6037d4440.

Keywords

  • org.mariadb.jdbc.client.impl.StandardClient
  • org.eclipse.jetty.ee8.webapp.WebAppClassLoader
  • java.util.HashMap$Node
  • org.infinispan.jmx.CacheManagerJmxRegistration

Problem Suspect 2

4.664 instances of org.apache.commons.dbcp2.PoolablePreparedStatement, loaded by org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x6037d4440 occupy 1.013.582.960 (14,87 %) bytes.

Most of these instances are referenced from one instance of java.util.HashMap$Node[], loaded by , which occupies 64.784 (0,00 %) bytes. The instance is referenced by org.infinispan.jmx.CacheManagerJmxRegistration @ 0x6003af4e8, loaded by org.eclipse.jetty.ee8.webapp.WebAppClassLoader @ 0x6037d4440.

Keywords

  • org.apache.commons.dbcp2.PoolablePreparedStatement

  • org.eclipse.jetty.ee8.webapp.WebAppClassLoader

  • java.util.HashMap$Node

  • org.infinispan.jmx.CacheManagerJmxRegistration

The heap dump shows over 4,600 PoolablePreparedStatement instances, consuming nearly 1 GB of memory. They are held by a HashMap of CacheManagerJmxRegistration. Together with the StandardClient instances, these objects occupy over 70% of the heap. It appears that JDBC objects are not being released correctly or are being cached for too long.

The heap dump analysis shows that PoolablePreparedStatement instances are being held via the Infinispan JMX registry and are not being released. Could it be that prepared statements are not being closed correctly or are being held in the cache for too long?

It does indeed sound a lot like a not released database session. It also sounds very strange to me to cache a PoolablePreparedStatement, since it should always be temporary (or you indeed get exactly the kind of problems you are describing).

Have you installed any extension other than the standard flavor on your wiki ?

It would probably help a lot finding what feature is responsible if you could find the SQL statement(s) associated to all those PoolablePreparedStatement instances somewhere in the memory dump.
Another possibility is trying to find the id of the cache where those are stored.

We’re running a lot of additional extensions and have even programmed a few Velocity/Groovy scripts and macros. I don’t want to rule out the possibility that this is the cause. So I think it makes more sense to interpret the dump correctly. Your comment has helped a bit.

This is my first time analyzing a heap dump, so please forgive my incompetence: Unfortunately, I’ve only found the SQL command stored in the “PoolablePreparedStatement”:

select xwikidocum0_.XWD_LANGUAGE as col_0_0_ from xwikidoc xwikidocum0_ where xwikidocum0_.XWD_WEB=? and xwikidocum0_.XWD_NAME=? and (xwikidocum0_.XWD_LANGUAGE<>‘’ or (xwikidocum0_.XWD_LANGUAGE is not null) and (‘’ is null))

Where the parameter “NAME” = “XWiki”. I haven’t found the parameter “WEB” (yet).

Since the path to GC Root of the “PoolablePreparedStatement” leads via

  • “org.apache.commons.dbcp2.PoolablePreparedStatement”
  • “org.apache.commons.pool2.impl.DefaultPooledObject”
  • “java.util.concurrent.ConcurrentHashMap$Node”

I have dealt a little with the DBCP and now set the following keys in /etc/xwiki/hibernate.cfg.xml

<property name="hibernate.dbcp.removeAbandonedOnBorrow">true</property>
<property name="hibernate.dbcp.removeAbandonedOnMaintenance">true</property>
<property name="hibernate.dbcp.removeAbandonedTimeout">60</property>
<property name="hibernate.dbcp.logAbandoned">true</property>

I hope that this will firstly prevent any more Java heap errors and secondly provide more information about the unclosed statement without storing to many log entries…

I update this thread as soon as I have some information.

Looks pretty basic (it’s just gathering the translations of a specific document), so hard to lead to some specific feature. It looks exactly like the standard API to get this information, but this is not really supposed to be cached (not even the result of this query). That’s the only statement you found while looking at various of those cached PoolablePreparedStatement instances ?

For the cache, if you cannot find the identifier, it would be interesting to pull the string of what is referencing this cache directly or indirectly. Might help find out the code where it’s retained and so what feature it’s about.

You’re right: The SQL connection isn’t always the same. I randomly examined a PoolablePreparedStatement and found various SQL statements.

select xwikidocum0_.XWD_ID as col_0_0_ from xwikidoc xwikidocum0_ where xwikidocum0_.XWD_ID=?

select xwikircsno0_.XWR_DOCID as xwr_doci1_33_0_, xwikircsno0_.XWR_VERSION1 as xwr_vers2_33_0_, xwikircsno0_.XWR_VERSION2 as xwr_vers3_33_0_, xwikircsno0_.XWR_ISDIFF as xwr_isdi7_33_0_, xwikircsno0_.XWR_PATCH as xwr_patc8_33_0_ from xwikircs xwikircsno0_ where xwikircsno0_.XWR_DOCID=? and xwikircsno0_.XWR_VERSION1=? and xwikircsno0_.XWR_VERSION2=?

select xwikispace0_.XWS_ID as xws_id1_35_0_, xwikispace0_.XWS_REFERENCE as xws_refe2_35_0_, xwikispace0_.XWS_NAME as xws_name3_35_0_, xwikispace0_.XWS_PARENT as xws_pare4_35_0_, xwikispace0_.XWS_HIDDEN as xws_hidd5_35_0_ from xwikispace xwikispace0_ where xwikispace0_.XWS_ID=?

update xwikidoc set XWD_FULLNAME=?, XWD_NAME=?, XWD_TITLE=?, XWD_LANGUAGE=?, XWD_DEFAULT_LANGUAGE=?, XWD_TRANSLATION=?, XWD_DATE=?, XWD_CONTENT_UPDATE_DATE=?, XWD_CREATION_DATE=?, XWD_AUTHOR=?, XWD_CONTENT_AUTHOR=?, XWD_CREATOR=?, XWD_WEB=?, XWD_CONTENT=?, XWD_VERSION=?, XWD_CUSTOM_CLASS=?, XWD_PARENT=?, XWD_CLASS_XML=?, XWD_ELEMENTS=?, XWD_DEFAULT_TEMPLATE=?, XWD_VALIDATION_SCRIPT=?, XWD_COMMENT=?, XWD_MINOREDIT=?, XWD_SYNTAX_ID=?, XWD_HIDDEN=?, XWD_ORIGINAL_METADATA_AUTHOR=?, XWD_ENFORCE_REQUIRED_RIGHTS=? where XWD_ID=?

select integerpro0_.XWI_ID as xwp_id1_32_0_, integerpro0_.XWI_NAME as xwp_name2_32_0_, integerpro0_1_.XWP_CLASSTYPE as xwp_clas3_32_0_, integerpro0_.XWI_VALUE as xwi_valu3_23_0_ from xwikiintegers integerpro0_ inner join xwikiproperties integerpro0_1_ on integerpro0_.XWI_ID=integerpro0_1_.XWP_ID and integerpro0_.XWI_NAME=integerpro0_1_.XWP_NAME where integerpro0_.XWI_ID=? and integerpro0_.XWI_NAME=?

select baseproper0_.XWP_CLASSTYPE as col_0_0_ from xwikiproperties baseproper0_ where baseproper0_.XWP_ID=? and baseproper0_.XWP_NAME=?

delete from xwikidocumentindexingqueue where XWT_DOC_ID=? and XWT_TYPE=? and XWT_VERSION=?

I haven’t found anything about the cache yet. I’m hoping to read something from the logs now – assuming the abandoned connections appear in the logs…

As soon as I find something, I’ll post it here. It usually took a few days for the problem to reoccur.

I could try to check the memory dump if you want, but that would give me access to potentially sensitive information from your server memory, which is not great.

Thank you for the offer. I might consider it. But first, I’d like to give the log a chance.

Hi,

A quick update from me:

I think I’ve identified and resolved the problem: We use the “LDAP Authenticator” extension to authenticate users. The plugin creates its own job scripts to automatically synchronize groups and users at night.

That’s not a bad thing. It’s bad if the same extension is installed in multiple subwikis and the same jobs are running there at the same time. This resulted in various Java exceptions when executing various SQL commands (accessing the same rows in the same tables at the same time).

Since disabling the jobs in the subwikis, memory usage hasn’t increased as much. I’m cautiously optimistic that that was it.

Thanks again for the help.

Regards,
B.

1 Like