XS Quality Report proposal

ilie.andriuta · February 9, 2022, 1:00pm

Hi, everyone!

The quality of the XWiki product and Recommended Extensions is very important and in order to assess correctly the evolution of the quality, we need to define the process for a Quality Report and the metrics that need to be considered for tracking it.

Also, creating a Quality Dashboard for every year/product cycle will provide useful information (we currently have XWiki Standard Bug Statistics - XWiki.org JIRA for Platform which provides a very good insight into quality and Recommended Extensions Bug Statistics - XWiki.org JIRA for Recommended Extensions).

Since the most important part of this process is selecting the useful metrics (that provide the best image about reality) for comparing the last year with the previous one, the proposal list will be started with the following:

Number of JIRA issues (Bugs) Created vs Resolved (for example using the JIRA filter category = 10000 AND issuetype = Bug ORDER BY key DESC)
Number of JIRA issues (Non Bugs) Created vs Resolved (using the JIRA filter category = 10000 AND issuetype in (Idea, Improvement, "New Feature", Task) ORDER BY key DESC)
Number of regressions fixed in the last year vs previous year (by using for ex. for 2021: category = 10000 AND labels = regression AND created >= 2021-01-01 AND created <= 2021-12-31 ORDER BY created DESC)
Number of Security Issues fixed in the last year vs previous year (by using for ex. category = 10000 AND labels = security AND created >= 2021-01-01 AND created <= 2021-12-31 ORDER BY created DESC)
Performance stats comparing latest finished cycle with the previous cycle (latest report is available here)

Any feedback for whether these metrics are appropriate and suggestion of other important metrics will be appreciated!

Thanks!

MichaelHamann · February 9, 2022, 1:20pm

Could you maybe provide some explanation why you consider these to be good metrics and if you consider a high value or a low value a good value? I’m trying to understand how these metrics should be interpreted and what impact optimizing these metrics could have on the development of XWiki. Below some feedback:

I guess what you propose is that the goal is maximize number of resolved minus number of created bugs? Optimizing this metric could disincentivize reporting bugs that are not going to be fixed soon which might still be important to track technical debt. On the other hand, solving bugs is important, of course.

Do you suggest we should stop reporting potential improvements before implementing them to avoid that we have too many open improvement issues? Of course, it is nice to have many features/improvements implemented, but some issues represent big features while others represent minor improvements so I do not think absolute numbers are a good comparison here.

It is not clear for me here what the goal is. Is it bad to fix many security issues as it means acknowledging we had many security issues? Or is it bad to fix few security issues as it means we left many security issues unaddressed?

Your query seems to indicate you are assuming regressions are always fixed and were introduced in the current version? Maybe it would be more reasonable to track regressions introduced by releases in that year? As we should avoid introducing regressions this seems to be a reasonable metric to optimize.

With all these metrics, I’m wondering how you categorize issues that haven’t been closed as fixed but closed as duplicate, as won’t fix or closed by another issue.

MichaelHamann · February 9, 2022, 1:46pm

What I’m wondering is if there are other metrics to track. Some ideas:

Metrics reported by Sonarcloud (not sure if they have historical summaries, though).
New/fixed flickering tests or overall number of flickering tests (available in Jira).
Number of files ignored by CheckStyle/number of checkstyle violations in these files (if we can get them).

Note that these metrics are more relevant for developers than for users but still it might be a good idea to track them, too.

vmassol · February 9, 2022, 1:50pm

Hi @ilie.andriuta

That’s a very good start. Some comments:

I don’t think the number of JIRA issue matter much. What feels more important is the evolution of this, as is shown on XWiki Standard Bug Statistics - XWiki.org JIRA
Note: when drawing conclusions from the reports we’ll need to include all work done by the XWiki core committers. The reason is that only considering XS can present a false view of the work when comparing it year over year, as XWiki core devs can work on extensions as was the case in 2021 for ex where a lot of the work was done on 3 extensions: Change Request, Replication and Numbered Content.
Re regressions, should be use blockers or regressions? I’d say blockers. I would also only include final releases (we need to exclude RC releases). Then see the evolution using a Created vs Resolved graph IMO.
Same for security issues, I’d use a Created vs Resolved graph as the raw number doesn’t matter that much.
I’d add an important new metric in your list: Global TPC (global test percentage coverage) and its evolution over time. As a submetric, easier to compute, we could have the total # of tests evolution. Re Global TCP, see http://maven.xwiki.org/site/clover/. Note to self: check our CI job and make it pass again: Clover [Jenkins]

Thanks

vmassol · February 9, 2022, 1:52pm

To be clear, what’s important is the difference and time between created and resolved, not the raw numbers.

ilie.andriuta · February 10, 2022, 2:39pm

Hi! Thank you for the feedback!

@MichaelHamann
I’ve listed these metrics because I just thought they are appropriate for the report.
Of course, a particular filter alone does not provide the most clear image about reality, but more a general one.
I think I’ve missed the proper naming of these metrics as I had in mind the evolution of Created vs Resolved issues and not the raw number of them.
Regarding issues closed as Duplicate, they can be excluded from the filters, but they are still “Closed”, as well as Won’t fix ones.
Maybe a more clear description for the metrics would be issues (Bugs, Non Bugs) Closed instead of Fixed, because an issue can be Closed with many Resolutions, right? Not just Fixed. That means the Assignee spent time and worked to investigate the issues and closed them, regardless of their Resolution.

@vmassol
I think as well that the most important is the evolution, not the effective number of issues.

A few questions:

Should Change Request, Replication and Numbered Content extensions be included in all these filters (and in their respective graphs)? Or additional filters should be made and include the issues for extensions there?
Regarding regressions, I think we should use Blockers eventually, because almost all regressions were reported as Blockers
For Global TPC, http://maven.xwiki.org/site/clover/, here I open http://maven.xwiki.org/site/clover/20210208/XWikiReport-20210128-0145-20210208-0128.html for 2021 and take into consideration TPC New vs TPC Old percentage for ALL Modules, right? It seems the most recent reports are from 8th February 2021 (there is a period when these reports are made? or I don’t know how to access more recent reports).

Thanks!

vmassol · February 10, 2022, 2:53pm

I don’t think so. This report is a XS report. If you wanted to extend it, one idea would be to include Recommended Extensions but I’m not sure I’d do that for several reasons:

Recommended extensions are not maintained by the XWiki Core devs
It would mix the quality of the core (i.e. XS) vs the quality of extensions. Better to keep them separate (you could have 2 reports if you wanted).

My point was that when doing the analysis, we need to take into account the activity of the core committers (and their numbers) to explain stuff as otherwise you get wrong analysis (e.g. saying that it’s because that 13.x was a stabilization cycle that the number of reported issues went down).

vmassol · February 10, 2022, 2:53pm

It’s not because of that but because blockers are… blockers, i.e. important. Regressions are only one type of blocker issues.

vmassol · February 10, 2022, 2:54pm

As I said above, it’s currently broken and needs to be fixed.

ilie.andriuta · February 15, 2022, 10:29am

Thanks for the answers! I will create and populate the XS System Quality Dashboard with the respective gadgets and start a draft on Draft Documents (Drafts.WebHome) - XWiki for the Quality Report.

ilie.andriuta · March 7, 2022, 11:53am

Hi!
I’ve created the XS System Quality Dashboard and I’ve filled the XS and Recommended Extensions Quality Report for 2021 as a draft at this moment.
Please take a look over it and see if it’s ok (also please go ahead and add/delete/modify content where the case). Thanks!

vmassol · March 9, 2022, 3:44pm

Thanks @ilie.andriuta that’s good and interesting.

Some remarks:

“Evolution of Non Bugs”. The reason is not so much that we didn’t get time to work on them because we worked on some contrib extensions. It’s more that the collaboration domain is quite large and they’ll always be more demands/needs than what we can provide/implement. So I don’t think this metric is very interesting on the subject of a quality report. I’d remove it from both the dashboard and the report
Global coverage. I haven’t checked the data you provided (BTW they’re in % you should update the report) but you need to pay attention to any flickering failing tests during the coverage data gathering since any failing test will decrease the % value, so it’s very possible that the global TPC didn’t decrease in the end. This needs to be checked.
Maybe add a recommendation section at the end with some bullet points about what could be done to improve.

Thx!

ilie.andriuta · March 15, 2022, 3:20pm

Hi!
I’ve updated the XS System Quality Dashboard and the XS and Recommended Extensions Quality Report for 2021 by removing the “Evolution of Non Bugs” section and restructuring it a little bit, also added some recommendations at the end (which I thought to be appropriate).

When you have some time please check the Global TPC and verify if the data is correct (I don’t know for sure how to check flickering failing tests during the coverage data gathering).

Thanks!