Stabilized and faster CI

vmassol · August 26, 2020, 12:54pm

Hi everyone,

Some of us gathered during XWiki SAS’s seminar in August and we had a session about how to improve our CI further on some aspects.

Here’s what we said below

Topics

How we can reduce flickerings?
How can we stabilize the tests?
How can we reduce the time it takes to be ready for a release?
Should we extend the brainstorming about a 3rd aspect: which is “how to reduce the build time?” and “how to get faster feedback from the CI?”?
Split standard job / docker job

Format:

Vincent: Retry failing tests a number of times. How do we handle the extra time required?
Vincent: Put actions in the roadmap. Which ones?
Vincent: How can we ensure that new tests don’t introduce new flickerings? One idea is to make sure that all testing API (our test DSL) take an element to wait on so that we never forget to do that.
Vincent: strategy to migrate more tests to docker?
Vincent: Should we remove flickers from our tests that are executed? And then have some strategy to look at them to fix them and put them back?
Vincent: What strategy can we have to fix the existing flickering tests? Should all devs participate and take some to fix? Should we have someone dedicated with time for that? Should we inroduce a new XWiki day for that, every week for example? What about not being allowed to start work for a release until all flickering tests are fixed? What about spending one full XWiki version release (i.e. 1 month) just for flickering tests and blocker bugs?
Vincent: We have more and more func tests and they’re both time-consuming and fragile. What about defining a strategy for when a test should be a functional one and when it should be a unit or integration one? (e.g. PageTest or even isolated unit test). For example testing various conditions could be done in unit or integration tests and only 1 path tested as a func test (to make sure all elements work together correctly).
Simon: Have a staging CI to test our script changes

Run functional tests outside of the “main” build and run them in parallel on multiple agents (as it’s done for the docker tests). Build the pageobjects as part of the “main” build though. This would speed up execution time of func tests. Run this after the “main” build to make sure everything is rebuilt. Also allows to get the artifacts from “main” faster and pushed the test (i.e. quality) validation a bit after.Note: execute in // of “distribution” for tests in “platform-core”. - see https://jira.xwiki.org/browse/XINFRA-317
Collect the list of failed functional tests and re-run them at the end in a separate job
Try again to identify flickers and to update the flickering state in JIRA in our pipeline
Try & Create documentation to setup a local jenkins CI with the same config as ci.xwiki.org (scp the config and use docker for jenkins)
Find one or several existing functional test that could be written as a PageTest (or a unit test) (idea of one test to be transformed: https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwiki-platform-flamingo/xwiki-platform-flamingo-skin/xwiki-platform-flamingo-skin-test/xwiki-platform-flamingo-skin-test-docker/src/test/it/org/xwiki/flamingo/test/ui/VelocityIT.java)

Feel free to comment. I’m going to start by implementing https://jira.xwiki.org/browse/XINFRA-317

Thanks