How do I find the illegal hex character that halted a MediaWiki XML conversion through Filter Streams?

Trying to see if we can convert our MediaWiki. Hit this error quickly:

URLDecoder: Illegal hex characters in escape (%) pattern - For input string: “_O”
class org.xwiki.filter.FilterException: Failed to parse XML

Naturally I’d like to edit the XML to get past that. It’s unclear where the error is. I’m not finding any instance where “_O” is part of a string along with “%”. There are over 700 cases of “_O” in a text string, but all are normal text. Is there a way to get a line number in the XML file for where the conversion encounters an error like this? Or some other way to zero in on it?

Thanks,
Whit

Do you have a complete stack trace which go along with that ? It might help to see the method it went through to have some clues of where to find the problem or where to add more log to help.

Created document [Home » Main » Time_Tracking » WebHome]
URLDecoder: Illegal hex characters in escape (%) pattern - For input string: “_O”
class org.xwiki.filter.FilterException: Failed to parse XML
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:315)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.xwiki.filter.FilterException: Failed to convert content page
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.convertToXWiki21(MediaWikiInputFilterStream.java:672)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPageRevision(MediaWikiInputFilterStream.java:599)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPage(MediaWikiInputFilterStream.java:452)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readMediaWiki(MediaWikiInputFilterStream.java:356)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:344)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:313)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: “_O”
at java.net.URLDecoder.decode(URLDecoder.java:194)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.toEntityReference(MediaWikiInputFilterStream.java:198)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiContextConverterListener.refactor(MediaWikiContextConverterListener.java:177)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiContextConverterListener.beginLink(MediaWikiContextConverterListener.java:230)
at org.xwiki.rendering.listener.WrappingListener.beginLink(WrappingListener.java:251)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.LinkEventGenerator.begin(LinkEventGenerator.java:33)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.AbstractEventGenerator.traverse(AbstractEventGenerator.java:88)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:421)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:389)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:367)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:356)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:374)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.WPListBlockEventGenerator.traverseElements(WPListBlockEventGenerator.java:142)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.WPListBlockEventGenerator.traverse(WPListBlockEventGenerator.java:82)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.WPListBlockEventGenerator.traverse(WPListBlockEventGenerator.java:50)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:421)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:389)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:367)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:356)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.nodesToText(EventConverter.java:232)
at info.bliki.wiki.model.AbstractWikiModel.render(AbstractWikiModel.java:1245)
at org.xwiki.contrib.mediawiki.syntax.internal.input.MediaWikiSyntaxInputFilterStream.read(MediaWikiSyntaxInputFilterStream.java:104)
at org.xwiki.contrib.mediawiki.syntax.internal.input.MediaWikiSyntaxInputFilterStream.read(MediaWikiSyntaxInputFilterStream.java:47)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.convertToXWiki21(MediaWikiInputFilterStream.java:670)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPageRevision(MediaWikiInputFilterStream.java:599)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPage(MediaWikiInputFilterStream.java:452)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readMediaWiki(MediaWikiInputFilterStream.java:356)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:344)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:313)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Would like to do some trial and error, removing hex characters from the XML to be input. What’s the method to clear the Filter Streams page for the fresh run at it?

The character I’m speculating about is (%)3D, which substitutes for “=” in a number of quoted Jira URLs on our MediaWiki pages (because that’s what Jira does with some instances of “=” in its URLs). That suspicion is because it looks like the pages are being imported in order, and the next hex code after the “Time Tracking” page (which has none) is – aside from (%)20 (space), (%)3D – in the context of a Jira URL which will work as well with “=” there.

From what I understand of URLEncoder code, you seem to have a page content which contains a link which contains “%_O” in its reference (which is invalid URL escaping).

The mediawiki filter assume references in mediawiki links are always URL encoded, but it might be more complex than that.

I’ve grepped (and searched with a text editor), and there is no instance of that string at all. The are cases of “%” in page titles, and the page titles replace spaces with “_”, for instance a page with a title “Batch Two, 90% Coverage” becomes “Batch_Two,_90%_Coverage”. But there doesn’t happen to be any such substitution where the sequence becomes “%_O”. As I said, that just doesn’t occur here.

Is there a way to make changes in the XML and make a fresh run, short of reinstalling the whole stack to start fresh?

Not really sure I understand your question, actually. There is no caching between executions, every time you execute the import it read whatever XML you gave it so it if you modified the XML it will take it into account.

If I leave the “Filter Streams Converter” page and go back to it, I end up with the session I ran before still displayed there. How do I start a fresh instance of it? What’s on the page from the old session isn’t even the full form to specify what to import. Ah, I see, it expands if I start to fill it in fresh…

Removed each of the half-dozen lines in the XML which contained “%” as well as “_O” (always at some distance, never together anywhere). It still produced the same error in the same place. Could “_O” refer to some variable that in turn holds the string which the converter is choking on?

Progress! The string that was a problem was several page titles that had the phrase “100% Offline” in them. There was no instance with an underscore in place of that space. There must be an intermediate step in the conversion that swaps one for there, and then causes the subsequent step to fail. Now it’s getting to a new error:

Created document [Home » Template » ! » WebHome]
An Entity Reference name cannot be null or empty
class org.xwiki.filter.FilterException: Failed to parse XML
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:315)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.xwiki.filter.FilterException: Failed to convert content page
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.convertToXWiki21(MediaWikiInputFilterStream.java:672)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPageRevision(MediaWikiInputFilterStream.java:599)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPage(MediaWikiInputFilterStream.java:452)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readMediaWiki(MediaWikiInputFilterStream.java:356)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:344)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:313)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class java.lang.IllegalArgumentException: An Entity Reference name cannot be null or empty
at org.xwiki.model.reference.EntityReference.setName(EntityReference.java:212)
at org.xwiki.model.reference.EntityReference.(EntityReference.java:154)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.toEntityReference(MediaWikiInputFilterStream.java:266)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiContextConverterListener.refactor(MediaWikiContextConverterListener.java:177)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiContextConverterListener.beginLink(MediaWikiContextConverterListener.java:230)
at org.xwiki.rendering.listener.WrappingListener.beginLink(WrappingListener.java:251)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.LinkEventGenerator.begin(LinkEventGenerator.java:33)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.AbstractEventGenerator.traverse(AbstractEventGenerator.java:88)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:421)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:389)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:367)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:356)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.AbstractEventGenerator.traverse(AbstractEventGenerator.java:91)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:421)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:389)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:367)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:356)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.AbstractEventGenerator.traverse(AbstractEventGenerator.java:91)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.ParagraphEventGenerator.traverse(ParagraphEventGenerator.java:48)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:421)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:389)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:367)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:356)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.nodesToText(EventConverter.java:232)
at info.bliki.wiki.model.AbstractWikiModel.render(AbstractWikiModel.java:1245)
at org.xwiki.contrib.mediawiki.syntax.internal.input.MediaWikiSyntaxInputFilterStream.read(MediaWikiSyntaxInputFilterStream.java:104)
at org.xwiki.contrib.mediawiki.syntax.internal.input.MediaWikiSyntaxInputFilterStream.read(MediaWikiSyntaxInputFilterStream.java:47)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.convertToXWiki21(MediaWikiInputFilterStream.java:670)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPageRevision(MediaWikiInputFilterStream.java:599)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPage(MediaWikiInputFilterStream.java:452)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readMediaWiki(MediaWikiInputFilterStream.java:356)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:344)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:313)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

I’ll also mention in passing that to solve the “file not found” problem for images it’s necessary to move them from the subdirectories MediaWiki keeps them in into a single folder, which can be done from the base folder with: find . -type f -exec mv {} . ;

Now to find the empty reference…

Removed a couple of templates it had trouble with. Got much farther. Then ran into a “% C” string problem, in pages with titles about stuff like “100% Complete”. Running again after replacing “% C” with " percent C" universally – just taking that % out of the titles where it was follows by C somehow wasn’t enough. As a feature request, to have a line number returned for what it errors on would be golden.

Yikes! Now to research what an “Incomplete trailing escape (%) pattern” is.

URLDecoder: Incomplete trailing escape (%) pattern
class org.xwiki.filter.FilterException: Failed to parse XML
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:315)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.xwiki.filter.FilterException: Failed to convert content page
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.convertToXWiki21(MediaWikiInputFilterStream.java:672)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPageRevision(MediaWikiInputFilterStream.java:599)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPage(MediaWikiInputFilterStream.java:452)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readMediaWiki(MediaWikiInputFilterStream.java:356)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:344)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:313)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class java.lang.IllegalArgumentException: URLDecoder: Incomplete trailing escape (%) pattern
at java.net.URLDecoder.decode(URLDecoder.java:187)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.toEntityReference(MediaWikiInputFilterStream.java:198)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiContextConverterListener.refactor(MediaWikiContextConverterListener.java:177)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiContextConverterListener.beginLink(MediaWikiContextConverterListener.java:230)
at org.xwiki.rendering.listener.WrappingListener.beginLink(WrappingListener.java:251)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.LinkEventGenerator.begin(LinkEventGenerator.java:33)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.AbstractEventGenerator.traverse(AbstractEventGenerator.java:88)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:421)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:389)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:367)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:356)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:374)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.WPListBlockEventGenerator.traverseElements(WPListBlockEventGenerator.java:142)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.WPListBlockEventGenerator.traverse(WPListBlockEventGenerator.java:82)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.WPListBlockEventGenerator.traverse(WPListBlockEventGenerator.java:50)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:421)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:389)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:367)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.traverse(EventConverter.java:356)
at org.xwiki.contrib.mediawiki.syntax.internal.parser.converter.EventConverter.nodesToText(EventConverter.java:232)
at info.bliki.wiki.model.AbstractWikiModel.render(AbstractWikiModel.java:1245)
at org.xwiki.contrib.mediawiki.syntax.internal.input.MediaWikiSyntaxInputFilterStream.read(MediaWikiSyntaxInputFilterStream.java:104)
at org.xwiki.contrib.mediawiki.syntax.internal.input.MediaWikiSyntaxInputFilterStream.read(MediaWikiSyntaxInputFilterStream.java:47)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.convertToXWiki21(MediaWikiInputFilterStream.java:670)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPageRevision(MediaWikiInputFilterStream.java:599)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPage(MediaWikiInputFilterStream.java:452)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readMediaWiki(MediaWikiInputFilterStream.java:356)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:344)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:313)
at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.read(MediaWikiInputFilterStream.java:84)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:220)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Would there be a way for you to share a mediawiki package which reproduce the problem to debug and understand what is the problem exactly and more important how to fix it. It really feels like something wrong on the filter side (maybe something changed in mediawiki format).

Finally did get a complete conversion. The problems (aside from some evident garbage in a template definition) were all related to the inclusion of “%” in page titles, several instances % followed by a space and then O or C, and one of % at the end of a title (i.e. “100%”). We’re running a fairly stock mediawiki-1.34.0 from 2019, although some of the pages with % in the titles go back a full decade to older versions. The reason we’re trying XWiki is that no one but the tech staff here has ever really embraced MediaWiki. That the conversion even works as well as it does is a big plus.

There’s internal business data in the content. Would it work for debugging the filter if I just pull out the stanzas for the pages that choked it? I can probably do that without revealing too much. I suspect there are thousands of MediaWiki instance out there where people have never happened to us “%” in page titles.

In general the best would be to create a BUG issue on https://jira.xwiki.org/browse/MEDIAWIKI with easy steps to reproduce it.

Yes, most probably, that’s why I would be very interested in a way to reproduce this problem and try to understand where exactly it’s coming from and how to fix it.

Tricky to reproduce. Pulling out the page stanzas individually where it choked and importing them individually, they work. So it’s something in the larger context which sets up the failure on some of the strings within page stanzas which include % signs. Any ideas on what else might be involved in setting up the failures? I do want to find a way to make import dependable, since our MediaWiki keeps getting updated while we decide if XWiki is a viable replacement.

I suppose I could write something that would remove just those pages with “%” anywhere in the title (and/or perhaps pages which link to them by name) out of the larger XML file to a separate file, and then import that separately, perhaps avoiding the context in which the strings are misread as “illegal hex”. Meanwhile I’ll keep looking for a way to reproduce the error without throwing the full XML at it, since full-file trial runs take a good chunk of time.

Okay, this line at the top of MediaWiki’s XML export sets up the failure:

<mediawiki xmlns=“http://www.mediawiki.org/xml/export-0.10/” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=“http://www.mediawiki.org/xml/export-0.10/ http://www.mediawiki.org/xml/export-0.10.xsd” version=“0.10” xml:lang=“en”>

I can run exactly the same set of page stanzas without that line smoothly through the conversion filter. But with that line at the top, the % sign used as its ascii self gets interpreted as part of an illegal hex string. I’ll head over to your Jira…

https://jira.xwiki.org/projects/MEDIAWIKI/issues/MEDIAWIKI-103

1 Like