As initially suggested by @MichaelHamann, it seems the easiest to fix Loading... would be to just stop storing diff and always store full content for each version (something you can already do with xwiki.store.rcs.nodesPerFull=1 in xwiki.cfg).
I think I heard someone in the back complaining that it’s going to waste tons of space if we do that. The (hopefully enough) answer to that is compression. Currently, we store a full content every 5 versions, so if the compression is good enough to reduce the size to close to 20% it should not eat more space (we might actually win space in some cases if the changes are big enough). The first idea that comes to mind is to compress the data before storing it, but I parsed the documentation of the various databases we officially support and spotted a native compression solution for each one:
PostgreSQL: it’s compressed by default, but only for content bigger than 2kB (can be changed using the TOAST_TUPLE_THRESHOLD and TOAST_COMPRESSION_THRESHOLD, but the documentation is not super explicit on how to actually set those, might be more obvious for someone more used to PosgreSQL than me) which can happen for a document without any xobject and a small content
MySQL and MariaDB: it can be enabled when creating the table (if we find a way to tell the hibernate initializer that), or shortly after the creation (on our side) by setting the raw format COMPRESSION (the default is DYNAMIC which also have some storage optimization, but not as much as COMPRESSION)
Oracle: similarly to MySQL and MariaDB, you can tell Oracle that you would like the table to be COMPRESSED
HSQLDB: this one is not too critical as it’s not really recommended to use it in production, but it can be enabled in the hibernate.cfg.xml using the property hsqldb.lob_compressed
So as an experiment, I would like to propose, starting with XWiki 17.0.0 (or at least ASAP in 17.x), the following:
make sure the RCS table has some native compression enabled:
doing nothing for PostgreSQL for now (we might decide later to try to force reducing the THRESHOLD)
make sure compression is enabled for the table xwikircs in MySQL, MariaDB and Oracle
add a migration to convert existing tables
make sure the table creation is done with the right flag
change the default value of xwiki.store.rcs.nodesPerFull to 1, effectively disabling the diff by default (someone can always revert it to 5, or even increase it if not happy with the result in terms of used space)
Thanks @tmortagne,
Do you have numbers regarding the performance improvement of this change?
I’m +1 assuming we have a significant performance improvement.
The numbers I don’t have is the performances cost of enabling compression in the RCS table for MySQL for example. But I suspect it’s low (but still not null, as it would be the default I assume).
Are you sure we can do this? Are you confident that Hibernate allows controlling this or is your idea to write some DB-dependent code to issue some custom SQL for each DB to enable compression on the history table?
As I indicated above, if we don’t find how to tell Hibernate initializer to set a specific flag in the create table, we can always set it right after, on our side (so basically what we’ll do in the migration).
+1 from my side. I would like to mention the following benefits:
better performance both for storing revisions (huge performance improvement as mentioned by Thomas) and for viewing them (just need to read a single entry, no need to apply up to 4 diffs)
much more robust storage with less potential for unreadable revisions. I think that issues like XWIKI-19596 simply won’t exist anymore when not storing diffs (I assume that issue is due to a temporarily inconsistent state of the history store).
much simpler and thus less error-prone revision deletion (at the moment, I would bet for example that you end up with broken revisions if two users delete adjacent old revisions at exactly the same time).
While this is the first step, I think we should think about next steps:
Stop supporting storing new revisions with diff.
Migrate existing diffs to full revisions and remove support for reading diffs.
It makes sense to treat them as separate steps, though, we could plan them for the next development cycles (e.g., first for 18.x, second for 19.x).
One cons (which could actually be yet another pro, see after) is the “Green IT”/ecology part with the increase in storage space.
However, I have no idea how to measure that and whether the speed increase (thus less CPU) is not completely offsetting this. Note that if we use DB compression this also increases the CPU needs.
Globally it’s extra “cost” of storing more info vs extra “cost” of reconstructing the full information from less storage.