Distributing XWiki in jetty-based rootless containers

Hi everyone,

Recently, XWiki SAS had the chance to work with Germany on a project called openDesk ([1],[2]), which aims at creating a sovereign alternative to Office 365 or Google Workplace, entirely Open Source.

This project allowed to finance several improvements within the core of XWiki, but we had to keep some of them out of XWiki Core, mainly for practical reasons. In particular, we decided to provide XWiki bundled within a rootless Docker container, based on the jetty-alpine image ([3]), instead of the standard Tomcat image.

The use of jetty-alpine as a base image was essentially motivated by 2 reasons :

  • Image size : the size of jetty:9-jre11-alpine is 180MB whereas the size of tomcat:9-jre11 is 275MB
  • Security :
    • The Tomcat image is based on eclipse-temurin:11-jre-jammy, which is based on Ubuntu Jammy’s base image. Including anything from Ubuntu base image adds up to the attack surface of the image : an attacker gaining access to a shell on the image would have more tools at its disposal to change the image (using apt to install extra packages, for example).
    • Rootless : the Jetty image allows running XWiki as a rootless container, which is currently not the case for the Tomcat container ([4]).

Specifically for this project, we also had some licensing-related issues : publishing any code on OpenCode (the platform used for Germany’s Open Source projects) needs to go through a thorough license check against a list of whitelisted licences. Some components included in the Tomcat image did not pass these checks.

Today, we see more and more professional users of XWiki inquiring for docker images that are slimmer, and expose less attack surface to answer for some security needs. As such, I believe it could be valuable to start providing a distribution of XWiki docker images based on Jetty. We could re-use the work @gsautner already in a repository specific to OpenDesk ([5]), as this code is already licenced under LGPLv2.

What do you think ?

Thanks,
Clément

[1] BMI / openDesk - Der souveräne Arbeitsplatz / Info · GitLab
[2] CIO Bund - Souveräner Arbeitsplatz
[3] Docker
[4] running as non-root: Unable to create directory for deployment: [/usr/local/tomcat/conf/Catalina/localhost] · Issue #209 · docker-library/tomcat · GitHub
[5] distribution/docker · master · XWiki SAS / openDesk / XWiki · GitLab

2 Likes

Hi Clement,

Thanks for raising this interesting topic!

The main reason I’ve used the default Tomcat image tag when creating our docker image was to get a full OS for a production-ready image. We need a base OS which is something acceptable for the majority of our users.

I’ve done quick search to re alpine in production and I got:

Alpine is a popular choice for creating Docker images because it’s lightweight, fast, and secure. However, whether you should use Alpine-based images for production depends on a few factors.

Firstly, it’s important to consider the specific requirements of your application. If your application requires specific packages or dependencies that are not available in Alpine’s repositories, using an Alpine-based image may not be the best choice.

Secondly, it’s important to consider the trade-off between image size and functionality. While Alpine-based images are smaller and faster, they may not offer the same level of functionality as larger images based on other operating systems. If your application requires a lot of dependencies or libraries, it may be more appropriate to use a larger image that includes everything you need.

Thirdly, it’s important to consider the security implications of using Alpine. While Alpine is known for its security, it’s still important to keep your images up to date with security patches and to follow best practices for securing your containers.

Ultimately, the decision of whether to use Alpine-based images for production depends on your specific use case and requirements. It’s important to carefully consider the trade-offs between image size, functionality, and security, and to choose the option that best fits your needs.

Source: https://www.quora.com/Should-I-use-Alpine-based-docker-images-for-production

This is easy to fix actually and it can be done in our own docker image. See Loading...

Ofc it’s also possible to support several images but it’s more costly as it requires more maintenance and right now we’re running on minimal maintenance (just me ATM :)) for our docker images (adding alpine on top of debian would require supporting 3 more images). Adding more images would require to find more committers maintaining them.

Thus for me the real question is whether switching to alpine for our current 3 images would be a good move for our users or not. I don’t know the answer. It could be interesting to ask our users.

Thanks

I don’t think we should necessarily switch all our docker image to jetty-based ones. IMO the best would be to provide two sets of images, one supporting based on tomcat (current) and one based on jetty.

As you said, the choice of images comes with some tradeoffs. As an example, the jetty-alpine image I described above does not come with LibreOffice installed, and the XWiki instance needs to rely on an external LO server. But this is an acceptable choices for users who run XWiki in an environment that need to follow some compliance rules.

It seems that the GitHub issue mentioned in this ticket (tomcat:9-jre8-alpine permission problems while running as a non-root user · Issue #147 · docker-library/tomcat · GitHub) is obsolete as Tomcat does not publish alpine-based images on Docker. I do not think it would be that easy to build a rootless docker container based on Tomcat’s official images.

Some comments/thoughts:

  • I had forgotten the jetty vs tomcat part of your message. I don’t see why we would provide jetty images. That’s more work and if we do this then we have a matrix issue (some users will want jetty on debian for example). Also, all our stats show that the huge majority of users of XWiki use Tomcat and not Jetty.
  • Rootless tomcat doesn’t depend on debian vs alpine. I feel it’s easy to have Tomcat not started as root on Debian (basically you just need to chmod the tomcat files and ofc create a user). In any case that’s very easy to check and I’d like to try this when I get some time.
  • The tomcat image also doesn’t come with LO installed. We install it in our Dockerfile. Are you saying that LO cannot be installed on alpine?
  • Generally speaking, I don’t think it’s a good idea to provide another set of images for 2 reasons: almost nobody has asked for it (if you check the stats for the xwiki docker images and how many request it, you’ll find it at 0.0001% or probably even less), and it’s more maintenance. As I said, we don’t even have someone maintaining the current images and I do it in my free time only. I don’t feel it’s worth it to burden us more for so few users.
  • What could be done could be for the community to provide an Alpine-based Dockerfile, not maintained by the XWiki core dev team.

Thanks

I’m not sure how this is not already an issue : we already support Jetty as a servlet container and we even provide install instructions for it.

Additionally, as far as I know, it is not possible to run the Tomcat Debian package we provide today on the latest Debian Stable (XWIKI-21137). One of the possible solutions for this issue would be to consider providing a Jetty-based package. So we may actually have to look into providing a Jetty Debian package in any case, regardless of this discussion.

I believe these statistics are skewed … we have been promoting Tomcat-based deployments for ages, by advertising it as the go-to method for production deployments. This is also a recommendation that has been followed by companies providing service around XWiki.

To be clear, I’m pretty sure that we can improve the security of our tomcat-based images, however I believe that this would actually cost us more in terms of maintenance time than just using jetty-alpine based images.

There’s a reason why we decided to go with jetty-alpine as part of the openDesk project I talked about at the beginning of this thread : it is actually much easier to start from a jetty-alpine image to achieve some level of security than starting from one of the official tomcat images and then try to trim it down to a minimum. By using the jetty-alpine image, we don’t have to care about setting-up the servlet container to depend on a slimmer base image : the image itself is already slim. The Dockerfile used to build the image is actually very similar to the one we use today in xwiki-docker to build our tomcat-based images.

No, LO can be installed on Alpine, however I’m saying that entities looking to deploy XWiki within a more secure environment would look into taking the LO server out of the docker image, and run it in a distinct docker for example.

I would not agree with this. There are IT teams looking for that sort of image. However, I do not believe that these are the kind of requests that would end-up on a community forum.

At XWiki SAS, we get multiple requests from government entities and corporations that need some kind of security and compliance guarantees when it comes to deploying XWiki. Providing a slimmer, more secure Docker image is a solution for this class of users. These users may not be seen directly on this forum, but that contribute heavily to the life of the XWiki software by either performing large deployments (which brings visibility), or paying for support subscriptions, which allow financing developers in the community.

We could discuss this point I believe. In general, we need to make sure that we keep our Docker images properly up to date.

Thanks,
Clément

See Loading...

There’s a reason we’ve only provided a Tomcat-only solution for some installation methods (apt & docker only AFAIK, while the demo package is on Jetty). It’s because it’s more popular and what users want to use, so when we need to cut maintenance cost, we take the most used one. A quick google search revealed Apache Tomcat VS Jetty - Web Server Technologies Market Share Comparison

When I say “what users want to use”, I mean what companies want to use since a good majority of them wouldn’t even consider using Jetty as they have standardized on Tomcat for other web apps already and don’t want to introduce another contender (training for infra teams for fine tuning the servlet container for performance/security/etc, new processes, etc).

Ofc if you have all the money and time in the world, you can decide to support all servlet containers. But it’s not the case, which is why when we want to cut maintenance costs, and we’ve decided to only support Tomcat in some cases (apt & docker).

I wasn’t aware of this, but I don’t see how this will not be an issue with Jetty too. If it’s not today, it’ll soon be too I guess. What it means, is that we need to move to jakarta packages ASAP. For me it doesn’t solve that much, especially since a lot of business users may want to stay on Tomcat.

ok, because we need consistency (either the Tomcat/debian image doesn’t provide LO anymore or if we add a new alpine-based image it should also have LO installed). It’s too much IMO to provide all the options. If we don’t provide LO anymore, we’ll need to document how to add it, which is also interesting (requires some research and testing work).

Sure, but that’s still 0.0001% of the need. BTW I knew about this, hence the 1 in the figure :slight_smile:

And since you talk about XWiki SAS, it could makes more sense for it to address these paying customers there than on the xwiki.org side. OTOH, I fully agree that we need to also focus on the security aspects on the xwiki.org side.

IMO, that’s the main point to discuss really. If you have time and money, it’s easy to support lots of different images for all types of needs. If you don’t have that then you need to take decisions to reduce the # of images.

Thanks

I’m currently rebuilding a custom image just not to run XWiki as root. Can an upstream fix make it more streamline? The main issue, once you chown the path /usr/local/tomcat to UID/GID 30001 (for example), you also need to run it with the same UID/GID.

I’ve just added (above the entrypoint):

RUN chown -R 30001:30001 /usr/local/tomcat
USER 30001:30001

to the Dockerfile and I’m running the container with that UID/GID.

Can it be more transparent to the user (able to pick whichever UID/GID they like), maybe some additional logic in docker-entrypoint.sh?

https://wiki.behemoth.co.il/bin/view/Public/Kubernetes%20Apps/XWiki%20(Helm)/