we have frequently the situation that builds on CI take very long, leading to a long queue and long wait times for builds, which sometimes also delays releases. I suggest improving that situation by implementing the following changes:
Don’t execute docker-based functional tests as part of the non-environment build and instead add the Jetty standalone configuration to the environment tests.
Move the testrelease build into the environment tests job (or a separate job). To compensate for the lack of JavaDoc validation, enable building JavaDoc in the quality build.
Group all environment builds of a single module into a single Jenkins job without cleaning the Maven repository in-between. If it isn’t too much work, this could exclude “big” modules like xwiki-platform-flamingo-skin-test-docker where we would still have one job per environment.
The rationale behind this is the following:
Significantly reduce the build time of the non-environment builds so we can get feedback more quickly. Currently, the first commit on the day starts a build and even if this was early in the morning, from what I remember this build frequently didn’t finish before the afternoon, so there is no way to get feedback on another change on the same day before late afternoon, which is already quite late. This change should also make it more likely that a build that was triggered by a change committed, e.g., at 17 o’clock finishes before the environment tests start, so the environment tests aren’t skipped (the environment tests are skipped if no non-environment build finished before).
Same as for 1, it is rare that testrelease fails with a problem that cannot also be caught by a different build, in particular if we add JavaDoc generation to the quality build.
Reduce the overhead in environment tests. We have currently 47 docker-based integration test modules, many of them are small and take between 5 and 10 minutes to execute. Therefore, even small overheads like downloading dependencies (30 seconds) and pre- / post-phases and the general overhead of a Jenkins job can have a significant impact. By combining all 5 environments (the 4 we have now and the Jetty-standalone one) into a single job, we can almost completely avoid dependency downloads in 4 out of 5 cases. At the same time, 47 different jobs should still offer us plenty of parallelism to fully utilize the CI capacity. If we can add exceptions for big modules, we could get even better utilization by having more uniform execution durations.
There are some disadvantages of this proposal that I see:
When you’re waiting for an environment build to finish before a release, you’ll no longer know early that docker-based tests passed at least on one or two environments. I hope that having faster builds in general will make this situation less frequent, but it could still be an issue.
Similarly, this makes it less okay to release without waiting for an environment build.
Initialization failures on one environment will mark all jobs as being in error, not just those of that environment.
There would also be the possibility to group several test modules of the same environment. The disadvantage that I see here that it is more difficult to find a suitable grouping criteria (just group 4 consecutive environments?) and the savings due to reduced dependency downloads will be less.
I’m not a big fan of this as it would mean having a single execution of those tests per day and even if they’re long we’re managing to get some feedbacks from them during a day when working on a specific feature.
I prefer that we work on reducing some tasks, e.g. not building javadoc / checking coverage at each commit build.
Now maybe another option would be to apply your proposal and to have a way to easily manually execute in the CI a build for running integration tests on a few specific modules when we want to check something.
+1 on the idea, now I’m not even sure we need the javadoc to be built for each commit.
Does it mean we’ll loose parallelized builds for the different envs? Ok just read the rationale, and that sounds like a good idea if we parallelize per test module indeed and run all env tests.
Not building those test would definitely make the main build a lot faster.
I’m just not sure about building them only once a day. Before the pipeline, we used to have something which was not that bad in terms of result: the parallelism was at job level instead of build level so we would have much faster result of non integration stuff, but the integration test was still trigger and built when they could (we did not have a separated job for each integration test so it was not filling the entire agents pool).
+1. I have a slight preference for a separate job (so that it’s more clear what is what on Jenkins).
I picked an arbitrary test of the Flamingo skin tests and checked its history for non-environment test executions on master (those are the tests that have neither the Chrome nor the Firefox tag): Tests | Develocity. Of all days with at least one execution, we had
33 days with a single test execution
25 days with two test executions
3 days with three test executions
So in other words, on more than half of those days, limiting us to a single test execution wouldn’t change anything.
Now we can wonder if we can maybe still get a result on the same working day when you performed the change. I therefore analyzed the last 50 test executions a bit (as they’re available). They cover the date range July 6 to July 28.
There were 22, 11, and 2 days with one, two and three executions, respectively. There were 19 days with one execution between 9 and 18 o’clock and just 1 day with more than one (two) execution between 9 and 18 o’clock. As I believe it is unlikely that a test execution before 12 o’clock is from a change done during working hours of the same day, let’s further filter this: There are just 14 days with a test execution between 12 and 18 o’clock and there is no day with more than one test execution in that time period.
Therefore, yes, if you’re lucky, you’ll get a test execution during working times for a change you performed on that day. But these numbers also confirm my suspicion: when a build is already running on a day, there is almost no chance that a second build will finish on the same working day. That’s why I suggest to dramatically reduce what we execute in the main build so we have a real chance of having several of them so we can actually notice and fix any problems we introduce on the same day.
As a first step, we could also group several (like 4 or 5) different UI tests into one Jenkins build to see if this improves anything already (for both types of executions) if you like this idea more.
And I don’t have anything against that, I agree that we need to have a job which can fully finish without waiting for integration tests (so that another one can run). I’m just not sure how often we should trigger the integration tests.
What we could possibly do is triggering an environment test build after the main build completes (so before the full job completes but after everything has been built). However, to avoid filling the CI with builds, I would suggest to only do this when:
The main build completes without error (test failures are okay).
There is not already an environment test build queued for the same branch.
There is no further main build queued for the same branch.
The idea of these conditions is to basically trigger an environment test build once all pending changes have been built successfully. As the last main build of a day should trigger an environment test build, there would also be no more need to trigger an environment test build with a daily timer. This would also ensure that the environment test build is triggered after the last main build completes, so even if this is delayed, e.g., until 6am, we would get a fresh environment test build with those changes.
Sounds good, +1. This should reduce a lot useless building work. I’m wondering if we should apply the same criteria to the other cross job triggers we have (xwiki-commons → xwiki-rendering and xwiki-rendering → xwiki-platform).
Sure, I actually assumed this was the default behavior of Jenkins, actually.
Might be a bit risky, but I guess in exceptional cases where the main build keep having stuff to build for a very long time (but without the integration test execution, and so the job build time reduce quite a lot, I suspect it should be very rare), and we need an integration test build, we can always trigger it manually.
The reason for running the func tests in one environment is to get an idea of whether a change in a module broke that module or dependent modules. If you remove that then you loose the information that if this build passes, the code base is in a good shape. You’d now need to wait for the env tests for the default env, to get that same information, giving you a false sense of statisfaction if it passes. Overall you’ll need to wait exactly the same amount of time (actually even a bit more since there might be other jobs executing in between).
Said differently, you’ll need to wait even longer than now to know if a commit you did broke something or not, which is the main reason for this CI build.
OTOH we’ll know of compile errors sooner. We won’t know of revapi or checkstyle errors sooner since these are in the quality build. BTW the quality build should also give the compile errors and that build starts as soon as there’s a commit too, and AFAIK it also executes unit tests. So I’m not sure what we’d win to change the main build.
If we were to remove the func tests from the main build then we should drop the main build all together and count on the quality one (or rename that one into the main build). Maybe I’m missing something important that we have in the main build and not in the quality build? I see we don’t use the snapshots profile for example in the quality build. Maybe we could add it.
Another idea to reduce the execution time of the main build would to execute only modules with changes and all dependent modules. I had researched this a long time ago and maybe there’s been progress on activating this.
The problem is that it is hard to know what changed. For example, there could have been changes in xwiki-rendering or xwiki-commons that break tests in an arbitrary module. I would rather rely on the build cache speeding up the build of truly unchanged modules.
Regarding the main build, we would still have the non-docker UI tests and the flavor tests in the non-environment build. They would still depend on the main build. Of course, if we want to reduce the overall CI usage, it could be a good idea to drop the main build, too. I would suggest implementing those changes one-by-one and to see how they affect our CI.
That is covered since those tests will have a dependency on the changed module.
I don’t understand. If you’re suggesting to remove the docker-based func tests then you should also propose to remove those since they serve the same purpose and take as much time, it’s just different technology.
EDIT: We need to keep a consistency/logic in our build IMO.
The reason we have two builds is that we wanted to have a result which is not blocked by “less critical” quality problems (which immediately stop the build). Basically the same reason why we don’t stop the build immediately when there are test failures.
I don’t mind moving even more tests to the “environment build” (that should probably be renamed to “Integration Tests” or “UI Tests”), I just proposed to start with the Docker-based tests as they are in the environment build already. I haven’t checked how difficult it would be to move other integration tests.
Yes I know this but my question was in the context of dropping func tests from the main build. If we do this then I don’t see why we would keep the main build vs the quality build. Dropping the main build would speed up a lot the CI and without loosing anything (unless I’m missing something which was my question).
It means that then we don’t get any snapshot modules deployed if there is any quality issue, and thus we cannot start any UI tests in the other build that make use of the freshly built modules. With my suggestion how to change the triggering of that build, we wouldn’t start any build at all, with the timer-based start we would start it, but it would execute a strange mix of new tests on old modules that could again give a false sense of “everything is okay” even though a change might have broken the tests.
For the exact same reason. I don’t see why removing the integration tests would change anything here, you would still be missing a lot of modules if any quality related failure happen at the beginning of the build.