Xwiki not responding to requests

I am stumped. I am running Ubuntu 22.04, xwiki 15.1, with nginx as a reverse proxy and mysql all on the same host. nginx error log is telling me it can’t connect to xwiki running on localhost:8080:

2023/03/21 05:01:21 [error] 1057#1057: *2366 connect() failed (111: Unknown error) while connecting to upstream, client: 54.183.255.133, server: wiki.swansway.com, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "wiki.swansway.com"
2023/03/21 05:01:22 [error] 1057#1057: *2367 connect() failed (111: Unknown error) while connecting to upstream, client: 177.71.207.165, server: wiki.swansway.com, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "wiki.swansway.com"
2023/03/21 05:01:23 [error] 1057#1057: *2370 connect() failed (111: Unknown error) while connecting to upstream, client: 54.252.79.165, server: wiki.swansway.com, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "wiki.swansway.com"
2023/03/21 05:01:25 [error] 1057#1057: *2372 connect() failed (111: Unknown error) while connecting to upstream, client: 54.232.40.69, server: wiki.swansway.com, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "wiki.swansway.com"
2023/03/21 05:03:08 [error] 1057#1057: *2414 upstream timed out (110: Unknown error) while reading response header from upstream, client: 158.247.68.206, server: wiki.swansway.com, request: "GET /xwiki/bin/view/Main/ HTTP/1.1", upstream: "http://127.0.0.1:8080/xwiki/bin/view/Main/", host: "wiki.swansway.com", referrer: "https://wiki.swansway.com/xwiki/bin/view/Main/Wiki%20Updates/"

I am not seeing anything error-related in catalina.out. When I run netstat -tunlp I see a big backup in recv-q:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
[...snip...]
tcp6     101      0 :::8080                 :::*                    LISTEN      3430/java           

I have no idea where to go for the next clue. I have upgraded everything I can upgrade. Can anyone point me at the next debug step? Or give me a clue what could be going on? It is like linux itself is just not routing the traffic from nginx to xwiki. The weirdest thing is this just started happening out of nowhere in the late evening on a Sunday. Extremely unexpected for our use case. Any advice would be very welcome. Thanks!

edit: note that I tried forcing Java to use the ipv4 network and it didn’t seem to help anything. No system changes would have (should have) happened with this stopped working.

I upgraded everything, including updating ubuntu to 22.04 and tomcat to tomcat9, repaired the various configuration files, and it seems to be working again. No explanation for why it just stopped working, and no clear explanation for why it is working again. No file I could find had any debug information whatsoever. Got lucky, I suppose.

Nevermind! It mysteriously stopped responding to requests again! If anyone has an idea how I can debug this, I am all ears.

Setting --connect-timeout on wget results in retries - seems to indicate it is some kind of tcp connection problem to tomcat9. But why?

$ wget localhost:8080 --connect-timeout=30
--2023-03-28 07:03:18--  http://localhost:8080/
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:8080... failed: Connection timed out.
Retrying.

--2023-03-28 07:03:49--  (try: 2)  http://localhost:8080/
Connecting to localhost (localhost)|127.0.0.1|:8080... ^C

Adding address=127.0.0.1 to the server.xml Connector tag got tomcat9 to respond again. There was not previously an address attribute on that tag.

However, it was responding previously without that, and I have another server (like a staging) that is setup using the exact same config (running the exact same Ansible provisioning scripts) that does NOT require that address change. I do not have high confidence that this is a lasting solution.

When I shutdown of a non-responsive run of the tomcat server, this is what the catalina.out logs look like: Xwiki problems related to network requests · GitHub

Basically what I am discovering is that when I start tomcat, I may get a responsive server or a non-responsive server. It seems fairly arbitrary which I will get on any particular attempt.

Continuing to post here in case anyone else comes across this problem and knows what to debug or look for to fix.

protocol says tcp6 while the upstream 127.0.0.1 appears tcp4 … maybe that’s something to look at?

… I’m not exactly an expert tho so take it for what it’s worth!

Good luck!

I don’t think it is related. Just just removed the address=127.0.0.1 parameter from server.xml and it is responding to requests, so I believe that is a red herring.

Weirdly, after fixing "My Activity Stream" just a stack trace, the server started responding to requests immediately. I don’t want to get my hopes up, but maybe there was a bad startup path that involved that problem. Only time and testing will tell. :slight_smile: