How to migrate existing data from MariaDB to another database?

1780yz · September 27, 2022, 12:15am

We have a production instance of XWiki v7.0.1 with MariaDB 10.1. Unfortunately, the instance has been running for many years, so the situation is “too-big-to-fail”.

I am trying to upgrade XWiki to the recent version of v13.10.9, but the database encoding stays as “latin1”, while the newer versions use “utf8”. In addition, the Release Notes indicates testing results with HyperSQL, MySQL, Oracle, and PostgreSQL. So, MariaDB is a non-standard configuration, although it has been running for years.

Before more testing, we would better move the existing data from MariaDB to one of the supported databases so that the issue will not be a false positive due to our non-standard database.

Do we have a way to do it? Please point out if I missed something on the official document website.

I appreciate your help.

tmortagne · September 27, 2022, 7:06am

See https://www.xwiki.org/xwiki/bin/view/Documentation/AdminGuide/Installation/InstallationWAR/InstallationMySQL/#HConvertadatabasefromlatin128orcollationutf8_ci29toutf8mb42Futf8mb4_bin.

By the way, “utf8” was already the expected encoding in XWiki long before 7.0.1. Currently, the recommended one is more “utf8mb4”.

As you can see on https://dev.xwiki.org/xwiki/bin/view/Community/SupportStrategy/DatabaseSupportStrategy MariaDB is a perfectly standard configuration. I’m pretty sure this is actually the most used configuration in production (it’s the one used for myxwiki.org and xwiki.org for example.

1780yz · September 27, 2022, 9:15pm

Thank you for pointing out the minor typo, “utf8” should be “utf8mb4”.

However, the issue still exists for a testing scenario using MariaDB v10.9.2 starting from XWiki v7.0.1.

First, if we the database in utf8,b4 by the SQL command create database xwiki default character set utf8mb4 collate utf8mb4_bin;, later on, XWiki v7.0.1 fails on 12% initialization, with an error message Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535..
Second, if we create the database in latin1, XWiki v7.0.1 works OK. And I guess this is why the previous engineer configured the system this way. However, upgrading to XWiki v13.10.9 does not change the column definitions, and if we try to change the encoding to utf8mb4, the error triggers the same way as above.

Partial exception trace of the error at 12% initialization:

...
Caused by: org.hibernate.HibernateException: Failed updating schema while executing query [create table 
activitystream_events (ase_eventid varchar(48) not null, ase_requestid varchar(48), ase_stream 
varchar(255), ase_date datetime, ase_priority integer, ase_type varchar(255), ase_application 
varchar(255), ase_user varchar(255), ase_wiki varchar(255), ase_space varchar(255), ase_page 
varchar(255), ase_hidden bit, ase_url varchar(2000), ase_title varchar(2000), ase_body varchar(2000), 
ase_version varchar(30), ase_param1 varchar(2000), ase_param2 varchar(2000), ase_param3 
varchar(2000), ase_param4 varchar(2000), ase_param5 varchar(2000), primary key (ase_eventid)) 
ENGINE=InnoDB]
...
        ... 20 common frames omitted
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Row size too large. The 
maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, 
check the manual. You have to change some columns to TEXT or BLOBs
...

tmortagne · September 28, 2022, 7:56am

That’s because utf8mb4 takes more room than utf8 (utf8mb3) and the schema back then was optimized with utf8 in mind (so converting to utf8 should work fine). You could upgrade with latin1 and change the encoding after (when the schema have been migrated to something compatible with utf8mb4).

As long as you don’t use any character which is not supported by latin1 (like any non-latin characters). The reason why a database is latin1 is generally because it used to be MySQL/MariaDB default and whoever created the database did not read the XWiki recommendation properly.

1780yz · September 28, 2022, 10:54pm

Thank you for your hints, @tmortagne.

I made the first successful upgrade test on a smaller goal to v7.1.4. I am in the process of more testing toward v13.10.9.

Charset latin1 is non-standard for sure, so before the upgrade, I converted it to utf8, not utf8mb4, for the reasons given above. Steps:

Dump production data from v7.0.1 in text format, get the insert commands only; no DDLs
On a virtual machine, install a blank new instance of the same version v7.0.1 on a database of utf8. Then, dump the database to get the DDLs only; no insert commands
On the virtual machine, restore the database with the DDLs dumped before, so we get a purged empty database with structures in the utf8 character set
Import data with the insert commands dumped from the production, so now we have the production data in utf8
Start up tomcat and verify the utf8 instance works OK, then start the upgrade with the WAR file

With the above preprocessing, the upgrade was smooth, and I got the differences below by comparing the DDLs before and after.

In the previous testing of upgrading to v13.10.9, we did not see any difference in the DDLs. So, I guess the Java source code upgrade includes some automatic data conversions, and maybe they failed due to our non-standard configuration. Right?

My Question:

How do we verify these database conversions are correctly applied? And, do we call them migrations, like the Python context?

DDL differences before and after the upgrade:

$ diff xwikiv-v7.0.1-utf8-ddl-xwiki-202209281146.sql xwikiv-v7.1.4-utf8-ddl-xwiki-202209281521.sql
208a209
>   `XWA_MIMETYPE` varchar(255) COLLATE utf8_bin DEFAULT NULL,
265c266
< ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
---
> ) ENGINE=InnoDB AUTO_INCREMENT=79 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
672c673
< ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
---
> ) ENGINE=InnoDB AUTO_INCREMENT=763 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
789c790
< -- Dump completed on 2022-09-28 11:46:33
---
> -- Dump completed on 2022-09-28 15:21:35

By the way, the earlier versions officially don’t support MariaDB, e.g. v7.0.1 and v7.1.4. And the file hibernate.cfg.xml doesn’t contain any entry about MariaDB. However, it works by adding a MariaDB URI into the MySQL section. So, we will keep this setting until reaching a later version with official support.

tmortagne · September 29, 2022, 7:54am

I would highly recommend using the MariaDB connector and not the MySQL one, especially before applying the upgrade to 13.10.9. One reason is that Hibernate identify MariaDB as the very old MySQL 5.5 when using the MySQL connector which could have impact on the way queries are generated and also impact some migrations (for example XWiki won’t increase the max size of short string fields because the new size is not supported by MySQL 5.5).

1780yz · September 30, 2022, 4:43am

Do we remember from which version XWiki started officially supporting MariaDB? That means using the specific MariaDB connector, not borrowing MySQL’s.

The good news is that I have reached v7.4.6 on the testing virtual machine. And, it feels too slow when running the small steps, so I hope to make bigger steps to get a faster upgrade path.

Also, thank you for your reply to my other posts. So, I will append later on if got more details in the testing.

1780yz · October 1, 2022, 2:30am

Good news: with the database in charset utf8, I have successfully reached the upgrade target for the first time.

Comparing the database DDLs of the start and target versions, we can see data type varchar(2000) becoming longtext and other changes. As attached below, a diff shows about 300 lines of differences, which means the automatic database change is functioning.

xwiki-ddl-diff-v13.10.9-vs-v7.0.1.txt (13.3 KB)

The testing upgrade path is:

(start) v7.0.1 -> v7.1.4 -> v7.4 -> v7.4.6 -> v13.10.9 (target)

The above path is only the first happy pass, and it contains extra intermediate hops for proof of concept purposes. And we are running more tests to finalize a faster and more reliable change plan.

By the way, when at v7.4.6, we attempted to upgrade to v8.4.6 and v8.4, but both failed. If interesting to you, see this post.

We will convert the database again from charset utf8 to utf8mb4, and this part should be easier.

Future question:

We aim to use the Docker image so that I will submit another question shortly afterward. Thank you again for your help and patience.