Separate Docker-only tests builds from standard builds in the CI

Hi everyone,

the current status of our CI since 6 months or more is to mix up in the same jobs different kinds of builds:

  • standard builds which execute all tests and the docker test if the agent support docker (now all our agents support docker)
  • docker-only builds that only execute docker tests on various configurations (we have even different kind of docker builds for the different set of config we want to test, but not the subject today)

If I’m right there was two main reasons for mixing different builds on the same job:

  1. to be able to see only one build failing
  2. to manage all branches with only one Jenkins file

Source: https://markmail.org/message/64mtprh5hshg2rtv and discuss on the chat.

Now after several months of using this setup, I don’t think we should continue with it, for two main reasons:

  1. mixing different kind of builds might lead to false positive: if a docker build is passing, then it will be “hidden” in the CI, even if the previous standard build was failing
  2. the history might be harder to follow: we should keep 30 builds, but in fact we are keeping 30 mixed builds of docker and standard builds. I don’t recall properly but I also think we had some issue at a point when following the history of a failing test because of mixed builds.

Another reason is that now we have docker agents we are executing the docker test by default in our standard build, so we won’t miss a failure on those tests even if we separate the docker-only builds. So IMO the first reason why we performed the separation is not good anymore.

Now the only problem that remains IMO is to separate the builds without having much work to do for the branches: AFAIR we cannot have two different maintained jobs with the same Jenkinsfile.

I don’t know Jenkins well enough to make good proposal there I think, but I guess we have some options, for example:

  • working only with standard jenkins configurations (not related to a Jenkinsfile) that the RM would copy / edit to put the right version after a release
  • put a dedicated Jenkinsfile at another place and improve our release script to perform the appropriate git actions on this place too

It would be indeed a bit more work for the RM than the automated setup we have, but I think it might be better on the long term to not mix up those builds. At least to not having each time to open the history to check the real status of the job. WDYT?

Another problem related to history:
We actually don’t enforce keeping both standard and docker builds, but we perform regularly docker builds (cron based), which means that we can end up with some history containing only docker builds. See the current status of our branch 11.9Capture%20d%E2%80%99%C3%A9cran_2019-11-25_11-34-47

For me (who did the work) there was only 1 reason, the point 2 you mentioned.

FTR I was the one maintaining the manual jenkins build for the different branches and it was a major pain and we were always missing branches, thus having bugs on branches that we didn’t know about (or too late). This is why I spent a substantial time to fix this (it took me quite a lot of days and weekends to succeed - I even wrote a blog post: http://massol.myxwiki.org/xwiki/bin/view/Blog/ScheduledJenkinsfile ;)).

This why I’m negative ATM to drop it until Jenkins offers a solution. I know that the current solution is not perfect but I prefer it over the alternative. And I really really prefer that go towards automation (our goal is to fully automate our release process and this would be a step backward).

There are always other ways to do it in Jenkins. Like for example the ability to dynamically create and delete jobs in Groovy but they require substantial work and are prone to failures and it’s hard to hook them at the right time in the lifecycle.

Note that I have also started an evaluation of GitHub Actions to see if it could replace our Jenkins but so far my answer is no and it’s not ready-enough for our needs.

Note that we can fix this one if we want, in our pipeline. For example, before setting a job as succeeded, we check if the last job of the other type is also passing and if not we mark the current job as failing, with a message in the Jenkins UI.

Now before this becomes a problem we would need to have our tests passing :slight_smile:

So to conclude, I’d really prefer to not go back but if all the devs want that, then I’m not going to oppose it. OTOH I’d prefer if someone else was doing the work (there are substantial changes to be made to our pipeline and Jenkins files, and also for some contrib projects I think).

Note: Indeed you may have forgotten contrib projects too which are also supposed to have this setup, so that’s a lot more branches to handle…

We already discussed this and it’s easy to fix (if we want). I don’t see that part as a blocker. The manual part of the alternative is a blocker for me though :wink:

Indeed I forgot about those.

I don’t agree: it’s already a problem, I just noticed that our 11.3 standard build is erroring, which is hidden by the other builds right now.

OK I don’t recall about this discussion.

-1

Should be easy. Not super clean but I prefer it to the current mess.

How would that work? The point of the Jenkinsfile is to be inside the repo it manages (and which has branches). And there can be only a single Jenkinsfile per repo.

The idea would be to have a dedicated repository with the same branch names (talking about platform there, not sure about contrib) which would checkout platform and perform the docker tests, to improve the release script to create / delete the right branches on this new repo, and to configure jenkins to listen on this repo changes.

From what I understand you’re proposing to have the release script create/delete the branches on these repositories (side note: we don’t use the release script for releasing contrib projects so we’d need a different solution there, and also for PR/feature branches).

Note: It seems to be a similar complexity than simply creating/deleting the jenkins jobs (there are APIs for that). At least the automatic job creation prevents having some useless repos.

There’s nothing to do if you create these repos in the xwiki github org or the xwiki-contrib one since they’re already configured with a GitHub Organization job type.

However, you’ll need a special pipeline for the docker tests executed with cron. So you’ll also need to trigger a build (from the release script or elsewhere) so that the cron is set up.

That was the idea yes.

Indeed, I haven’t thought about that possibility but might be better.

AFAIU that could be done through the jenkins API quite easily, so I don’t think it’s really an issue.

Now re contrib, you’re saying that right now the same pipeline is used, but I never saw any build from a contrib running with our docker configurations? It’s expected?

Until [JENKINS-43749] Support multiple Jenkinsfiles from the same repository - Jenkins Jira is fixed, I suggest to use 2 github organization jobs (this is the current one used Repositories (4) [XWiki] [Jenkins] for the XWiki github org), each one pointing to a different Jenkinsfile. The second github org job will be named “XWiki Environment Tests” (vs “XWiki” for the main one).

Specifically replace Jenkinsfile by:

  • Jenkinsfile
  • JenkinsfileEnvironmentTests

Only xwiki-platform will provide a JenkinsfileEnvironmentTests file. xwiki-commons and xwiki-rendering will only provide a Jenkinsfile file.

The JenkinsfileEnvironmentTests pipeline will register a crontab (similar to what we do in Jenkinsfile ATM) to execute the tests later at night.

We would also add the new “XWiki Environment Tests” github org job repos to the “Recommended Builds” view at Recommended Builds [Jenkins]

Once this is proven to work well, we will do the same for the xwiki-contrib github org.

In order to remove the issues with the “XWiki” github org job not finished executing before the “XWiki Environment Tests” ones start executing, I propose to use the Jenkins Lockable Resource plugin (Lockable Resources) to create a semaphore and thus have the later job wait on the first one to finish before executing. For ex, since we have a cron at 22:00 every day, if there’s a commit at 21:45 for ex, the first job will lock the resource and at 22:00 when the second job executes, it’ll wait till the first job finishes before continuing.

WDYT?

+1 globally, just some minor comments below

I guess it would not register a single crontab, but the same ones we have currently in jenkinsfile:

  • 1 each day for docker-latest
  • 1 each week for docker-all
  • 1 each month (?) for docker-unstable → not sure, we might not have this one

right?

I guess we will need to edit the Jenkins views to include jobs from both orga so that we can have all status in same view.

+1

yes

yes, that’s what I tried to say with:

We would also add the new “XWiki Environment Tests” github org job repos to the “Recommended Builds” view at Recommended Builds [Jenkins]

FTR, the jira issue to track the progress is Loading...

Note that it’s very possible that in the future Jenkins will add support for separate build histories inside a same job type. This seems to be what they propose in [JENKINS-43749] Support multiple Jenkinsfiles from the same repository - Jenkins Jira

If this where to happen, then we would need to move back to what we have now…

We’ll also need to decide if we want separate build histories for the various docker tests (latest vs all vs unsupported). Ideally yes but it could be a bit too much to create 3 new github org jobs.

Any opinion?

FTR there’s still a problem with the separation of jenkinsfile, see Loading...

It’s now been implemented.

Another option would be to keep a single jenkinsfile but when the crontab is triggered, we trigger another job. The issue is that this will still create a job entry in the job history for the main job.