Better test handling for our CI

vmassol · August 23, 2020, 4:06pm

Hi devs,

Right now we’re struggling with our functional tests, and especially with flickers. In the past and during STAMP I’ve tried to implement a strategy to handle flickers. It’s currently not working that great (known flickers are not recognized for example).

I’ve started listing some use cases that we may for the tests executing on our CI.

Use Cases

UC1: List the slowest executing tests (so that we can work on the slowest and more generally monitor the speed of our test, by graphing their speed over time)
UC2: List the most flickering tests (so that we can focus on them)
UC3: List known flickering tests (those registered in JIRA and open)
UC4: List discovered flickering tests (3 changes of state in the past 20 executions for example)
UC5: List the slowest unit tests (to be sure we don’t have unit tests that take too long)
UC6: Are there flickering tests which haven’t passed at least once during the current release timeframe (to help decide if we can release)?

Implementation

Implement JUnit5 TestListener and when a test succeeds or fails, log it in our Elastic Search instance (the one we currently use for ActiveInstalls ) under a different index of course).
Log the following information
- job name
- job number
- test id
- execution date
- execution time
- hostname
- is it a functional test or a unit test
- the exception if the test fails
- the console log if the test fails (found with system.out/err capture)
- is it a known flicker at time of execution
Impl note: the jenkins info can be retrieved from java using system properties.
Create some Kibana dashboard to implement the UC listed above.

This should allow us to keep a test database of all our test executions on the CI and see how it evolves. It allows to implement a lot more use cases than the ones listed, for example graph the number of tests executed over time, etc.

Other

Jenkins has a Logstash plugin to send the job logs to an ES instance (Logstash). However, 2 limitations with this (for test data);
- The data would be sent only when the maven build is finished (no realtime data)
- It’s going to take a lot of disk space (OTOH it could be interesting to have the job logs on ES for easier searching but in this case we might need a different ES instance).
Regarding showing known flickers in the job UI on jenkins, we could also query the ES instance at the end of a build to update the UI accordingly and list the flickers.

WDYT?

surli · August 24, 2020, 10:37am

Hi Vincent,

nice topic! Big +1 for advancing on it.

General questions: are the UC ordered by priority to have them?

I’m wondering about the usefulness of that info if you provide it by test in the case of Docker tests.
For docker tests I think we could need more info:

the time for the container to be created
the time for a whole scenario to run

We have to be very careful about that one: we should discard all erroring builds.

So UC1 was only for integration-tests? Or it’s a dup?

Personally I would also had another one:
UC7: Are there any flickering which haven’t failed at least once during a full year (to help define if we could stop consider it as a flickering even if we didn’t actively worked on it).

Is it something we want for our local builds? I’m tempted to say not really, so wondering if we shouldn’t have this log push at Jenkins code level?

Actually depends for what: AFAIR we iterate on the docker test module to execute them one by one no? So we would have this info almost in real time for them.

I don’t know if we could have it, but it would be interesting if we could record the current Maven project version, since you could have a failure on master in 12.6-SNAP and later in 12.8-SNAP but not in 12.6.x for example. Knowing it happened twice on master is not really helpful, while knowing in which version could help understand why in some cases.

vmassol · August 24, 2020, 12:16pm

Thanks for your feedback @surli!

Does it mean you’re ok with spending time to develop handling of our tests by ourselves (i.e. creating our database of test results in Elastic Search) instead of relying on Jenkins?

I’ve come to the conclusion that we have to do it if we wish to progress since Jenkins won’t do it anytime soon IMO.

Because we won’t have the time to work on all of them at once. So the idea was to list the ones taking the most time, to prioritize (same idea as the Sonarqube dashboard showing you the biggest contenders). I didn’t mention it but one reason tests take long is because some are badly written and wait for the timeout to occur. See Loading... (I still want to work on that issue BTW!).

Sure. We can’t easily have the time for containers to be created but we can probably have the time it takes to setup the test (the fixture time).

Regarding the whole scenario, why do you think it would be useful? We can’t compare scenario times between each others since some scenarios are small and other are large.

Yes, my idea was to recognize environment issues (as we already do BTW but we forgot to update them with new environment problems caused by the usage of docker), and to add that info to the recorded test data (similar to the flicker flag).

UC1 was for all tests. This one is only for unit in order to find our slow unit tests and fix them.

Sounds good.

I thought about it and my answer so far is: no. But it would be easy to do (easier than filtering only the CI builds). The reason I think it’s “no”, is because you need an internet connection (which is not always the case locally) and it would slow down the local execution a little bit and I’d prefer to have them as fast as possible.

For the tests yet but not for the other builds like the main build. It’s also not real time. It’s also a lot of data. I think it’s interesting to save build data but it’s an orthogonal topic and it would require a different setup, purge build data frequently, etc. While I intend to keep the test data for a much longer time. We can do all sort of stats over time with them.

So ideally we would need to save all external environment configuration data:

Maven version. We can have that BTW since there is ${maven.version} (see predefined_maven_properties/README.md at master · cko/predefined_maven_properties · GitHub).
Java version

EDIT: I think you meant the xwiki version. Yes we should log that too.

Other ideas:

I think we also need to know the list of tests executed before the current one since that can impact the results. We might want to save that too.

Thanks

surli · August 24, 2020, 2:09pm

AFAIU Jenkins only allows to keep all info about a build or no info. And our build are very expensive in terms of logs. And your proposal is to only keep some logs/info in an ES instance so that we can request it to get information on the long term.
So yeah it looks like something we need. Now I haven’t performed any search on how to do it yet, so maybe there’s already existing stuff on Jenkins that we can reuse or not, I trust you on that part if you say that we need to do it ourselves.

Still thinking on the long term: I really think that we’d need the info about the time taken for a container to be fully initialized. Maybe we need to request some APIs on TC for computing it properly, but I really think it would be helpful.

No but you could compare the evolution of a scenario when you added one test for example. And discover that a specific scenario is not long because of its tests but because of some repeated fixture. Things like that.

I’m not sure you answered my question there. I’m a bit worried that we put stuff in our java code that is related to a specific CI need. And that should be in CI level. For example, I was about to say that we should be extra-careful to not make a build (or lots of builds) fail or error because of the introduction of this ES stuff. And adding it at Java level is a very good example of something that could break our build or make them fragile for something which is really not related with building those elements.
Hence my question, about having the mechanism to push the info in the CI only, and not in the build.

For example, we could have some Java code to compute data when executing tests and create JSON files, and on the CI an addon in charge of sending those info on the ES instance.

surli · August 28, 2020, 3:39pm

About that topic, the proposal you made here is a general vision with long term goal, but I think there’s still some short term things to do that could deeply improve the way we work:

Be able to detect automatically in the CI the flicker tests that are recorded on Jira: we lost this ability with docker tests, so it now takes us a huge amount of time to check if tests are flickers or not. Time we could spend fixing them.
Be able to run a specific CI configuration for docker: we have several configurations that we run on the CI, but each time we want to reproduce a specific one we have to check again all the parameters and set them properly one by one. We should have a generic parameter that we could use for that.
Separate docker builds and standard builds from the CI, we already talked a lot about that, lately we found a technical solution on a suggestion from @tmortagne AFAIR about using git modules, so I think we should now implement it.

Those proposal are not about fixing the flicker indeed, but helping working with them and having more time to fix them.

surli · August 28, 2020, 3:49pm

Since I was fed up to type them each time, I just created this: xwiki-docker-configurations.sh · GitHub

vmassol · August 28, 2020, 4:04pm

Actually it’s the other way around. We didn’t have this before and this got implemented with the docker tests (ie when we moved to jenkins pipelines which was a prerequisite for the docker tests). The feature is still there, the issue is that it stopped working. I need to spend the time to understand why.

This is easy actually, you just need to copy paste it from the jenkins job step. I could output it to make it even easier to copy paste from the BO view. Right now you need to go in the step view.

Example:

https://ci.xwiki.org/job/XWiki/job/xwiki-platform/job/master/3700/testReport/junit/org.xwiki.administration.test.ui/AllIT$NestedUsersGroupsRightsManagementsIT/MySQL_8_0_x__Tomcat_9_x__Java_11___Chrome___Docker_tests_for_xwiki_platform_administration___Build_for_MySQL_8_0_x__Tomcat_9_x__Java_11___Chrome___Docker_tests_for_xwiki_platform_administration___createAndDeleteUser_TestUtils__TestReference_/
You copy the test desc: “MySQL 8.0.x, Tomcat 9.x (Java 11), Chrome - Docker tests for xwiki-platform-administration”
You got to the pipeline steps: https://ci.xwiki.org/job/XWiki/job/xwiki-platform/job/master/3700/flowGraphTable/ and you search for it with your browser search feature
You scroll down a bit and you copy paste it: mvn -f pom.xml clean verify -Pdocker,legacy,integration-tests,snapshotModules,docker --projects xwiki-platform-core/xwiki-platform-administration/xwiki-platform-administration-test/xwiki-platform-administration-test-docker -e -U --no-transfer-progress -Dmaven.test.failure.ignore -Dmaven.build.dir=target/mysql-8.0-pom-tomcat-9-jdk11-chrome -Dxwiki.checkstyle.skip=true -Dxwiki.surefire.captureconsole.skip=true -Dxwiki.revapi.skip=true -Dxwiki.spoon.skip=true -Dxwiki.enforcer.skip=true -Dxwiki.test.ui.database=mysql -Dxwiki.test.ui.databaseTag=8.0 -Dxwiki.test.ui.jdbcVersion=pom -Dxwiki.test.ui.servletEngine=tomcat -Dxwiki.test.ui.servletEngineTag=9-jdk11 -Dxwiki.test.ui.browser=chrome -DjenkinsAgentName="Jenkins SSH Slave a5-00dqqdyqz7qyk" .

Not sure how this is helping.

This is actually dangerous. There’s no guarantee that it matches what is executed. We regularly change the parameters (at least once per month).

vmassol · August 28, 2020, 4:05pm

Let me know if you think it’d help.

vmassol · August 28, 2020, 4:42pm

Actually it changes less often than what I said. I was thinking about the JDBC driver but that one is now using the version in the pom. So it’s only the tag for the images which do change but less often. I still believe it’s safer to use what the CI is using to be safe.

vmassol · June 24, 2024, 2:51pm

https://ge.xwiki.org/scans/tests?search.relativeStartTime=P90D&search.timeZoneId=Europe%2FParis&tests.sortField=MEAN_DURATION

https://ge.xwiki.org/scans/tests?search.relativeStartTime=P90D&search.timeZoneId=Europe%2FParis&tests.sortField=FLAKY

https://jira.xwiki.org/issues/?jql=%20labels%20%3D%20flickering%20and%20resolution%20%3DUnresolved%20%20

https://ge.xwiki.org/scans/tests?search.relativeStartTime=P90D&search.timeZoneId=Europe%2FParis&tests.sortField=FLAKY

Use GE and find conditions to identify functional tests or unit tests (e.g. not:CI stage=Build for IT* - not enough but a start).

Can be done using GE, using the timeframe of the release and sorted by most flickering, if you get one with 100% then you have it.

https://ge.xwiki.org/scans/tests?search.buildToolType=maven&search.timeZoneId=Europe%2FParis&tests.sortField=FLAKY

Conclusion:

I don’t think anymore that we need a new tool since ge.xwiki.org (Develocity) seems to globally do the job for us.