Filter Streams Help: MediaWiki Import

Hello All,

I’ve been testing out the Filter Streams application to import Wikipedia Pages and I’ve encountered a few issues. I really like the idea of this functionality though and think it will be very useful in future.

I am using this page as a test EJ200 exported to xml with all version history. I’ve extracted the images and saved them in a local folder.


  • MediaWiki Thumbnail images appear full size in XWiki
  • MediaWiki Heading 1 is translated to XWiki Heading 2 and so on
  • Can’t seem to force use of absolute links. i.e. XWiki creates all links as wanted links
  • Citations text is blank in the XWiki Put Footnotes macro. Links and bullet points are present, just no text.
  • Is there a way to ask templates/macros to be skipped if an XWiki version isn’t available?

Something that is probably local to my setup and maybe something to do with 400 versions:

  • SQL Error:
Error: 1062, SQLState: 23000
Duplicate entry '-8464588504410124962-Main.Snecma_M88.WebHome' for key 'PRIMARY' 
class com.xpn.xwiki.XWikiException: Error number 3201 in 3: Exception while saving document xwiki:Main.Eurojet_EJ200
    at com.xpn.xwiki.XWiki.saveDocument(
    at com.xpn.xwiki.XWiki.saveDocument(
    at com.xpn.xwiki.internal.filter.output.DocumentInstanceOutputFilterStream.maybeSaveDocument(
    at com.xpn.xwiki.internal.filter.output.DocumentInstanceOutputFilterStream.endWikiDocumentRevision(
    at sun.reflect.GeneratedMethodAccessor2710.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.xwiki.filter.internal.FilterProxy.invoke(
    at org.xwiki.filter.internal.CompositeFilter.invoke(
    at com.sun.proxy.$Proxy2872.endWikiDocumentRevision(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor2710.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.xwiki.filter.internal.FilterProxy.invoke(
    at org.xwiki.filter.internal.CompositeFilter.invoke(
    at com.sun.proxy.$Proxy2873.endWikiDocumentRevision(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor2710.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.xwiki.filter.internal.FilterProxy.invoke(
    at org.xwiki.filter.internal.FilterProxy.invoke(
    at com.sun.proxy.$Proxy2874.endWikiDocumentRevision(Unknown Source)
    at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPageRevision(
    at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readPage(
    at org.xwiki.contrib.mediawiki.xml.internal.input.MediaWikiInputFilterStream.readMediaWiki(

Also, is there a more detailed explanation of what each input field does? Struggling to understand the full range of functionality described at:

I realise this functionality is still experimental so happy for any feedback


I have used the extension myself and I found some issues in there too but often the behavior are is unexpected but correct:

Cannot confirm this one, but maybe it is because you manually extracted the images. The “real” image folder has a very complex structure including the prerendered thumbs. Maybe you missed them?

Actually this is not true. The problem is that there is a convention in MW that all “highest level”-headings in the article are headings level two, (e.g. ==Development==, a level one heading would be =Development=) so they are correctly translated MW-level 2 to Xwiki-level 2. But Xwiki treats missing level one headings “badly”.

The problem here is that the MW syntax make an explicit distinction between internal links ([[Word]] and external links [http://someurl.tld/more]. So all internal links in the beforementionend article are expected to translate to wanted links. Let’s assume you import a whole wiki and start with pageA that links to pageB. If you changed the link to be an external link and then later in the process import pageB it would no longer link in the wiki to pageB but to the original source. Not what you would expect? Or maybe I got you wrong?

That would be a great feature.

There hasn’t been much work on macros in the MediaWiki filter indeed, only a few of them are really converted to XWiki equivalents right now. Would also be nice to work on a macro converter extension point like I did in the Conflence one or the one Dokuwiki filter has. Now on XWiki side macros are extension points which can easily be implemented (see even an empty one to just not get any missing macro error but keep them in case you want to implement it later.

If there is mediawiki macros you really want to just skip I guess we could add some list property in the input filter. You seems to want to boolean property to skip all the macros with an id that does not exist in XWiki but that might be a bit dangerous (loosing information). In any case don’t hesitate to create a NEW FEATURE issue on Loading... and explain what you would like to see.

Yeah need spend time on it.

In the XWiki world using an URL (if that’s what you mean by “absolute link”) to lead to another page in the same wiki is a very bad practice which is why we tried hard to fix mistakes people often makes in wikis and convert URLs into wiki links when it made sense. What is your rational behind using an URL for this use case ?

Thank @rbr for the explanation. If using level 2 instead of level 1 in MediaWiki is such a common practice I guess we could consider a property to convert x to x-1 heading in the filter. Don’t hesitate to create an IMPROVEMENT issue if you feel it’s needed.

This kind of functionality is kind of always experimental since stuff change all the time on XWiki and MediaWiki sides so converting one into another will always have some gray areas :slight_smile:

If it where that easy. I migrated something like a few dozen pages and I have seen a wild mixture of pages starting with all kind of heading levels (mostly H1-H3).
And mediawiki is very robust against this, so users just don’t care.
The best approach would be a kind of “dynamic promotion” approach per page, meaning that the highest level of heading in MW gets promoted to level one, second highest to level two and so forth.

If the issue on XWiki side is just the toc macro behavior then we could also decide to make the toc macro automatically start the <ul> at whatever is the first heading level found in the content instead of an empty first level. I doubt anyone would be against it. No idea if there is already an issue about that on Loading....

Thank you very much for this. Very useful!

Sounds very possible, thanks. Is there a way to get the relevant parts of the image folder from Wikipedia rather than just downloading the images separately as I did?

I completely agree, and would normally want to preserve the internal links if I imported a whole wiki. However, for this test case I wanted the links to point to the relevant pages on Wikipedia. So I set the below to false thinking that it would force this, that wasn’t the case. @tmortagne, can you comment?

I’ll add a JIRA item for this. But I think @tmortagne made a good point about having empty macro definitions created automatically which I think is kind of what happens now if you include wiki templates in the xml.

RE: Headings >>

This sounds like a robust and desirable approach as I think consistency in a wiki is very important. I’ll create a JIRA improvement for this.

I think this is also worth doing as a generic improvement. I’ve also noticed that the Toc macro displays empty bullets if a user accidentally formats a blank line as a heading, probably worth catching this. I’ll create a new JIRA improvement for this too.

I’m not sure if this is the “right” approach. If you move content from MW to xwiki you should try adhere as much as possible to the target structure. So if you make the TOC macro more robust to this kind of “wrong-doing” people will not care again, as they did in MW before. Empty bullets are a nice “reminder” that something is just wrong.

Also I never was a friend of the MW convention anyhow because it felt unnatural and pushed the burden to solve a technical problem (getting formatting of heading levels right) on the user. And starting something with 2 feels just unnatural (actually starting things with 1 is also, if you are an IT guy :wink: ).

I think I agree


This actually does not have anything to do with URLs. In XWiki “absolute reference” (need to fix a typo it seems) means a complete wiki reference compare to a reference relative to the current document.

So back to you issue, you say you have URL to Wikipedia (so nothing to do with the wiki you exported, right ?) pages which are converted to wiki links ending up in a dead link ? That sounds like quite a bug, would be great if you could create a jira issue with a package to reproduce it.

I think I just misunderstood the input. The links are indeed internal Wikipedia links, and I wanted them to point to the original article on Wikipedia once imported into XWiki, but now that I think about it that’s impossible as the xml has no information on the full URL. It’s fine, ignore this, it was a very particular user case. I can always manipulate the xml in Python if I ever need this.

OK, always glad to have less things to fix :slight_smile:

Maybe you’re talking about interwiki links? XWiki does have support for that. So if there’s a prefix to know it’s a wikipedia link, you could configure the interwiki prefix to wikipedia in XWiki’s config. Search for interwiki on for example.

Config for interwiki links in XWiki:

But make sure to change the link definitions also from [[…]] to […] if you use external links, else the importer crashes horribly.

You sure about that ? I should just assume the URL as the name of the page and produce a bad wiki link.

I had a few cases where I had links like [[]] and the importer crashed. Took me a while to figure this out. I will try to reproduce this and open an issue. To be honest I forget to report it when I migrated.

Thanks !


Is the below a bug? or am I doing something wrong. I.e. would this populate if I included the templates in the exported xml?

Sorry not exactly a mediawiki expert so not really sure what you are talking about. Can you find the text you are expecting anywhere in the XML ?

Ideally when you don’t find on XWiki side something you have on MediaWiki side and you are more or less sure that content is in some form or another in the XML or the media files (or some other place the filter does not yet support and should) it would be great to create a jira issue with all the information and data to reproduce it. Then it’s much easier to discuss it :slight_smile:

For the benefit of people that come across this thread in future:
Dynamically Promote Heading Levels: Loading...
Ability to automatically remove unknown macros/templates: Loading...
PutFootnotes macro: Automatically add heading if missing: Loading...
Add reflist to putfootnotes macro mapping: Loading...
Table of Contents Macro: Handle empty headings: Loading...