How do I find the illegal hex character that halted a MediaWiki XML conversion through Filter Streams?

whit · July 12, 2021, 7:56pm

Tricky to reproduce. Pulling out the page stanzas individually where it choked and importing them individually, they work. So it’s something in the larger context which sets up the failure on some of the strings within page stanzas which include % signs. Any ideas on what else might be involved in setting up the failures? I do want to find a way to make import dependable, since our MediaWiki keeps getting updated while we decide if XWiki is a viable replacement.

I suppose I could write something that would remove just those pages with “%” anywhere in the title (and/or perhaps pages which link to them by name) out of the larger XML file to a separate file, and then import that separately, perhaps avoiding the context in which the strings are misread as “illegal hex”. Meanwhile I’ll keep looking for a way to reproduce the error without throwing the full XML at it, since full-file trial runs take a good chunk of time.