Filter Streams Help: MediaWiki Import

There hasn’t been much work on macros in the MediaWiki filter indeed, only a few of them are really converted to XWiki equivalents right now. Would also be nice to work on a macro converter extension point like I did in the Conflence one or the one Dokuwiki filter has. Now on XWiki side macros are extension points which can easily be implemented (see https://www.xwiki.org/xwiki/bin/view/Documentation/DevGuide/Tutorials/WritingMacros/) even an empty one to just not get any missing macro error but keep them in case you want to implement it later.

If there is mediawiki macros you really want to just skip I guess we could add some list property in the input filter. You seems to want to boolean property to skip all the macros with an id that does not exist in XWiki but that might be a bit dangerous (loosing information). In any case don’t hesitate to create a NEW FEATURE issue on https://jira.xwiki.org/browse/MEDIAWIKI and explain what you would like to see.

Yeah need spend time on it.

In the XWiki world using an URL (if that’s what you mean by “absolute link”) to lead to another page in the same wiki is a very bad practice which is why we tried hard to fix mistakes people often makes in wikis and convert URLs into wiki links when it made sense. What is your rational behind using an URL for this use case ?

Thank @rbr for the explanation. If using level 2 instead of level 1 in MediaWiki is such a common practice I guess we could consider a property to convert x to x-1 heading in the filter. Don’t hesitate to create an IMPROVEMENT issue if you feel it’s needed.

This kind of functionality is kind of always experimental since stuff change all the time on XWiki and MediaWiki sides so converting one into another will always have some gray areas :slight_smile:

If it where that easy. I migrated something like a few dozen pages and I have seen a wild mixture of pages starting with all kind of heading levels (mostly H1-H3).
And mediawiki is very robust against this, so users just don’t care.
The best approach would be a kind of “dynamic promotion” approach per page, meaning that the highest level of heading in MW gets promoted to level one, second highest to level two and so forth.

If the issue on XWiki side is just the toc macro behavior then we could also decide to make the toc macro automatically start the <ul> at whatever is the first heading level found in the content instead of an empty first level. I doubt anyone would be against it. No idea if there is already an issue about that on https://jira.xwiki.org/browse/XWIKI.

@rbr,
Thank you very much for this. Very useful!

Sounds very possible, thanks. Is there a way to get the relevant parts of the image folder from Wikipedia rather than just downloading the images separately as I did?

I completely agree, and would normally want to preserve the internal links if I imported a whole wiki. However, for this test case I wanted the links to point to the relevant pages on Wikipedia. So I set the below to false thinking that it would force this, that wasn’t the case. @tmortagne, can you comment?
gsdgsdfg

I’ll add a JIRA item for this. But I think @tmortagne made a good point about having empty macro definitions created automatically which I think is kind of what happens now if you include wiki templates in the xml.

RE: Headings >>

This sounds like a robust and desirable approach as I think consistency in a wiki is very important. I’ll create a JIRA improvement for this.

I think this is also worth doing as a generic improvement. I’ve also noticed that the Toc macro displays empty bullets if a user accidentally formats a blank line as a heading, probably worth catching this. I’ll create a new JIRA improvement for this too.

I’m not sure if this is the “right” approach. If you move content from MW to xwiki you should try adhere as much as possible to the target structure. So if you make the TOC macro more robust to this kind of “wrong-doing” people will not care again, as they did in MW before. Empty bullets are a nice “reminder” that something is just wrong.

Also I never was a friend of the MW convention anyhow because it felt unnatural and pushed the burden to solve a technical problem (getting formatting of heading levels right) on the user. And starting something with 2 feels just unnatural (actually starting things with 1 is also, if you are an IT guy :wink: ).

I think I agree

LOL

This actually does not have anything to do with URLs. In XWiki “absolute reference” (need to fix a typo it seems) means a complete wiki reference compare to a reference relative to the current document.

So back to you issue, you say you have URL to Wikipedia (so nothing to do with the wiki you exported, right ?) pages which are converted to wiki links ending up in a dead link ? That sounds like quite a bug, would be great if you could create a jira issue with a package to reproduce it.

I think I just misunderstood the input. The links are indeed internal Wikipedia links, and I wanted them to point to the original article on Wikipedia once imported into XWiki, but now that I think about it that’s impossible as the xml has no information on the full URL. It’s fine, ignore this, it was a very particular user case. I can always manipulate the xml in Python if I ever need this.

OK, always glad to have less things to fix :slight_smile:

Maybe you’re talking about interwiki links? XWiki does have support for that. So if there’s a prefix to know it’s a wikipedia link, you could configure the interwiki prefix to wikipedia in XWiki’s config. Search for interwiki on https://www.xwiki.org/xwiki/bin/view/Documentation/UserGuide/Features/XWikiSyntax/?syntax=2.1&section=Links for example.

Config for interwiki links in XWiki: https://www.xwiki.org/xwiki/bin/view/Documentation/AdminGuide/Configuration/#HConfiguringInterwikilinks

But make sure to change the link definitions also from [[…]] to […] if you use external links, else the importer crashes horribly.

You sure about that ? I should just assume the URL as the name of the page and produce a bad wiki link.

I had a few cases where I had links like [[http://example.com]] and the importer crashed. Took me a while to figure this out. I will try to reproduce this and open an issue. To be honest I forget to report it when I migrated.

Thanks !

@tmortagne,

Is the below a bug? or am I doing something wrong. I.e. would this populate if I included the templates in the exported xml?

Sorry not exactly a mediawiki expert so not really sure what you are talking about. Can you find the text you are expecting anywhere in the XML ?

Ideally when you don’t find on XWiki side something you have on MediaWiki side and you are more or less sure that content is in some form or another in the XML or the media files (or some other place the filter does not yet support and should) it would be great to create a jira issue with all the information and data to reproduce it. Then it’s much easier to discuss it :slight_smile:

For the benefit of people that come across this thread in future:
Dynamically Promote Heading Levels: https://jira.xwiki.org/browse/MEDIAWIKI-87
Ability to automatically remove unknown macros/templates: https://jira.xwiki.org/browse/MEDIAWIKI-88
PutFootnotes macro: Automatically add heading if missing: https://jira.xwiki.org/browse/XRENDERING-520
Add reflist to putfootnotes macro mapping: https://jira.xwiki.org/browse/MEDIAWIKI-90
Table of Contents Macro: Handle empty headings: https://jira.xwiki.org/browse/XRENDERING-521

As we already have such a nice thread about the mediawiki importer: If there is an error would it be possible to print the snippet of mw markup that the importer choked on? Currently I could not find the snippet mentioned.

@rbr, not sure which snippet you’re referring to…