I upgraded from Xwiki 10.11.2 to 10.11.3 (LTS) two days ago. The server is running Debian 9 and I’m using the Xwiki Debian packages.
After one day (yesterday), OpenLDAP running on this server (and also used by Xwiki) failed due to file handle exhaustion. As I needed the system to work again, I restarted OpenLDAP which solved the immediate problem then.
Today we ran into the same issue and I took some information from the system before restarting slapd again. This showed thousands of established TCP connections between the processes java and slapd in “lsof”.
The only Java processes running on that machine belong to Tomcat, which only hosts Xwiki.
I also installed Debian security updates two days ago, but there basically was nothing else related to either OpenLDAP or Java:
Start-Date: 2019-03-13 17:05:19
Commandline: apt-get upgrade
Upgrade: libopenjp2-7:amd64 (2.1.2-1.1+deb9u2, 2.1.2-1.1+deb9u3), php7.0-bz2:amd64
(7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-cli:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3),
php7.0-gd:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-opcache:amd64
(7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3),
php7.0-recode:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-common:amd64
(7.0.33-0+deb9u2, 7.0.33-0+deb9u3), openssh-sftp-server:amd64 (1:7.4p1-10+deb9u5,
1:7.4p1-10+deb9u6), php7.0-json:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3),
php7.0-mbstring:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-readline:amd64
(7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-xml:amd64 (7.0.33-0+deb9u2,
7.0.33-0+deb9u3), php7.0-curl:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-zip:amd64
(7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-ldap:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3),
php7.0-mcrypt:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-imap:amd64
(7.0.33-0+deb9u2, 7.0.33-0+deb9u3), openssh-server:amd64 (1:7.4p1-10+deb9u5,
1:7.4p1-10+deb9u6), php7.0-intl:amd64 (7.0.33-0+deb9u2, 7.0.33-0+deb9u3),
openssh-client:amd64 (1:7.4p1-10+deb9u5, 1:7.4p1-10+deb9u6), libapache2-mod-php7.0:amd64
(7.0.33-0+deb9u2, 7.0.33-0+deb9u3), php7.0-mysql:amd64 (7.0.33-0+deb9u2,
7.0.33-0+deb9u3)
End-Date: 2019-03-13 17:07:02
Start-Date: 2019-03-13 17:32:17
Commandline: apt-get upgrade
Upgrade: xwiki-mysql-common:amd64 (10.11.2, 10.11.3), xwiki-common:amd64 (10.11.2, 10.11.3),
xwiki-tomcat8-mysql:amd64 (10.11.2, 10.11.3), xwiki-tomcat8-common:amd64 (10.11.2, 10.11.3)
End-Date: 2019-03-13 17:33:05
So I presumed that the issue was related to the Xwiki upgrade and just performed a downgrade to 10.11.2.
I’m not sure if this changed anything, though: Tomcat is only running for about an hour now but already has about 205 established connections to slapd - and it was only me performing some tests from the browser against the server and maybe two other users, that’s it.
Both processes belonging to the connections (slapd and java) are waiting in a “futex()” syscall if I strace them, but that’s probably not saying much…
I don’t really have a clue how to debug this behaviour - are that many established connections to be expected? I’d rather expect the ldap connections to be pretty short-lived, created and closed as needed, with maybe only a small pool of connections being keep open for caching purposes…